All Papers
34,180 papers found • Page 678 of 684
Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation
Mohammad Amin Shabani, Zhaowen Wang, Difan Liu et al.
Visual Objectification in Films: Towards a New AI Task for Video Interpretation
Julie Tores, Lucile Sassatelli, Hui-Yin Wu et al.
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
Zetong Yang, Li Chen, Yanan Sun et al.
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu, Otilia Stretcu, Chun-Ta Lu et al.
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan, Jinke Ren, Chun-Mei Feng et al.
Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach
Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal et al.
Visual Prompting via Partial Optimal Transport
MENGYU ZHENG, Zhiwei Hao, Yehui Tang et al.
Visual Redundancy Removal for Composite Images: A Benchmark Dataset and a Multi-Visual-Effects Driven Incremental Method
Miaohui Wang, Rong Zhang, Lirong Huang et al.
Visual Relationship Transformation
Xiaoyu Xu, Jiayan Qiu, Baosheng Yu et al.
Visual Representation Learning with Stochastic Frame Prediction
Huiwon Jang, Dongyoung Kim, Junsu Kim et al.
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
Jinhao Li, Haopeng Li, Sarah Erfani et al.
Visual Text Generation in the Wild
Yuanzhi Zhu, Jiawei Liu, Feiyu Gao et al.
Visual Transformer with Differentiable Channel Selection: An Information Bottleneck Inspired Approach
Yancheng Wang, Ping Li, Yingzhen Yang
VITA: ‘Carefully Chosen and Weighted Less’ Is Better in Medication Recommendation
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jieneng Chen, Qihang Yu, Xiaohui Shen et al.
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
Shicheng Li, Lei Li, Yi Liu et al.
ViT-Calibrator: Decision Stream Calibration for Vision Transformer
Lin Chen, Zhijie Jia, Lechao Cheng et al.
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
Chunlong Xia, Xinliang Wang, Feng Lv et al.
ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining
Dezhi Peng, Chongyu Liu, Yuliang Liu et al.
ViT-Lens: Towards Omni-modal Representations
Stan Weixian Lei, Yixiao Ge, Kun Yi et al.
ViTree: Single-Path Neural Tree for Step-Wise Interpretable Fine-Grained Visual Categorization
Danning Lao, Qi Liu, Jiazi Bu et al.
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
Jeong-gi Kwak, Erqun Dong, Yuhe Jin et al.
VividDreamer: Invariant Score Distillation for Hyper-Realistic Text-to-3D Generation
Wenjie Zhuo, Fan Ma, Hehe Fan et al.
VIXEN: Visual Text Comparison Network for Image Difference Captioning
Alexander Black, Jing Shi, Yifei Fan et al.
VkD: Improving Knowledge Distillation using Orthogonal Projections
Roy Miles, Ismail Elezi, Jiankang Deng
VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition
Ahmad Khaliq, Ming Xu, Stephen Hausler et al.
VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting
Seunggu Kang, WonJun Moon, Euiyeon Kim et al.
VLM2Scene: Self-Supervised Image-Text-LiDAR Learning with Foundation Models for Autonomous Driving Scene Understanding
Guibiao Liao, Jiankun Li, Xiaoqing Ye
VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation
Jialu Li, Aishwarya Padmakumar, Gaurav Sukhatme et al.
Vlogger: Make Your Dream A Vlog
Shaobin Zhuang, Kunchang Li, Xinyuan Chen et al.
VLP: Vision Language Planning for Autonomous Driving
Chenbin Pan, Burhan Yaman, Tommaso Nesti et al.
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye
VMINer: Versatile Multi-view Inverse Rendering with Near- and Far-field Light Sources
Fan Fei, Jiajun Tang, Ping Tan et al.
VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding
Yi Xin, Junlong Du, Qiang Wang et al.
VNN: Verification-Friendly Neural Networks with Hard Robustness Guarantees
Anahita Baninajjar, Ahmed Rezine, Amir Aminifar
Vocabulary for Universal Approximation: A Linguistic Perspective of Mapping Compositions
Yongqiang Cai
VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis
Linshan Wu, Jia-Xin Zhuang, Hao Chen
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Hubert Siuzdak
Volumetric Environment Representation for Vision-Language Navigation
Liu, Wenguan Wang, Yi Yang
Volumetric Rendering with Baked Quadrature Fields
Gopal Sharma, Daniel Rebain, Kwang Moo Yi et al.
VONet: Unsupervised Video Object Learning With Parallel U-Net Attention and Object-wise Sequential VAE
Haonan Yu, Wei Xu
VOODOO 3D: Volumetric Portrait Disentanglement For One-Shot 3D Head Reenactment
Phong Tran, Egor Zakharov, Long Nhat Ho et al.
VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model
Pengying Wu, Yao Mu, Bingxian Wu et al.
Voxel or Pillar: Exploring Efficient Point Cloud Representation for 3D Object Detection
Yuhao Huang, Sanping Zhou, Junjie Zhang et al.
VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation
Yang Chen, Yingwei Pan, haibo yang et al.
VPDETR: End-to-End Vanishing Point DEtection TRansformers
Taiyan Chen, Xianghua Ying, Jinfa Yang et al.
VP-SAM: Taming Segment Anything Model for Video Polyp Segmentation via Disentanglement and Spatio-temporal Side Network
Zhixue Fang, Yuzhi Liu, Huisi Wu et al.
VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving
Yibo Liu, Zheyuan Yang, Guile Wu et al.
VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models
Ziyi Yin, Muchao Ye, Tianrong Zhang et al.
VQCNIR: Clearer Night Image Restoration with Vector-Quantized Codebook
Wenbin Zou, Hongxia Gao, Tian Ye et al.