All Papers
34,598 papers found • Page 685 of 692
Conference
Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion
Xiang Fan, Anand Bhattad, Ranjay Krishna
VideoStudio: Generating Consistent-Content and Multi-Scene Videos
Fuchen Long, Zhaofan Qiu, Ting Yao et al.
Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
Xingyu Zhou, Leheng Zhang, Xiaorui Zhao et al.
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Yuchao Gu, Yipin Zhou, Bichen Wu et al.
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan et al.
vid-TLDR: Training Free Token Merging for Light-weight Video Transformer
Joonmyung Choi, Sanghyeok Lee, Jaewon Chu et al.
VidToMe: Video Token Merging for Zero-Shot Video Editing
Xirui Li, Chao Ma, Xiaokang Yang et al.
View-Category Interactive Sharing Transformer for Incomplete Multi-View Multi-Label Learning
Shilong Ou, Zhe Xue, Yawen Li et al.
View-Consistent 3D Editing with Gaussian Splatting
Yuxuan Wang, Xuanyu Yi, Zike Wu et al.
View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields
Haodi He, Colton Stearns, Adam Harley et al.
View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network
Quan Zhang, Lei Wang, Vishal M. Patel et al.
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
Lukas Höllein, Aljaž Božič, Norman Müller et al.
ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers
Jinke Li, Xiao He, Chonghua Zhou et al.
View From Above: Orthogonal-View aware Cross-view Localization
Shan Wang, Chuong Nguyen, Jiawei Liu et al.
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
Xianghui Yang, Gil Avraham, Yan Zuo et al.
Viewing Transformers Through the Lens of Long Convolutions Layers
Itamar Zimerman, Lior Wolf
Viewpoint-Aware Visual Grounding in 3D Scenes
Xiangxi Shi, Zhonghua Wu, Stefan Lee
Viewpoint textual inversion: discovering scene representations and 3D view control in 2D diffusion models
James Burgess, Kuan-Chieh Wang, Serena Yeung-Levy
Views Can Be Deceiving: Improved SSL Through Feature Space Augmentation
Kimia Hamidieh, Haoran Zhang, Swami Sankaranarayanan et al.
View Selection for 3D Captioning via Diffusion Ranking
Tiange Luo, Justin Johnson, Honglak Lee
ViG-Bias: Visually Grounded Bias Discovery and Mitigation
Badr-Eddine Marani, Mohamed HANINI, Nihitha Malayarukil et al.
VIGC: Visual Instruction Generation and Correction
Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Siming Yan, Min Bai, Weifeng Chen et al.
ViLA: Efficient Video-Language Alignment for Video Question Answering
Xijun Wang, Junbang Liang, Chun-Kai Wang et al.
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification
Jiangbo Shi, Chen Li, Tieliang Gong et al.
VILA: On Pre-training for Visual Language Models
Ji Lin, Danny Yin, Wei Ping et al.
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models
İlker Kesen, Andrea Pedrotti, Mustafa Dogan et al.
ViLT-CLIP: Video and Language Tuning CLIP with Multimodal Prompt Learning and Scenario-guided Optimization
Hao Wang, Fang Liu, Licheng Jiao et al.
VINECS: Video-based Neural Character Skinning
Zhouyingcheng Liao, Vladislav Golyanik, Marc Habermann et al.
VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception
Zhaoliang Wan, Yonggen Ling, Senlin Yi et al.
ViP: A Differentially Private Foundation Model for Computer Vision
Yaodong Yu, Maziar Sanjabi, Yi Ma et al.
ViPer: Visual Personalization of Generative Models via Individual Preference Learning
Sogand Salehi, Mahdi Shafiei, Roman Bachmann et al.
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Mu Cai, Haotian Liu, Siva Mustikovela et al.
V-IRL: Grounding Virtual Intelligence in Real Life
Jihan YANG, Runyu Ding, Ellis L Brown et al.
Virtual Immunohistochemistry Staining for Histological Images Assisted by Weakly-supervised Learning
Jiahan Li, Jiuyang Dong, Shenjin Huang et al.
VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement
Hanjung Kim, Jaehyun Kang, Miran Heo et al.
VISA: Reasoning Video Object Segmentation via Large Language Model
Cilin Yan, haochen wang, Shilin Yan et al.
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich, Niv Nayman, Sharon Fogel et al.
Visible and Clear: Finding Tiny Objects in Difference Map
Bing Cao, Haiyu Yao, Pengfei Zhu et al.
Vision-and-Language Navigation via Causal Learning
Liuyi Wang, Zongtao He, Ronghao Dang et al.
Vision-by-Language for Training-Free Compositional Image Retrieval
Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini et al.
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
yunxin li, Baotian Hu, Haoyuan Shi et al.
Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment
Huangbiao Xu, Xiao Ke, Yuezhou Li et al.
Vision-Language Dual-Pattern Matching for Out-of-Distribution Detection
Zihan Zhang, Zhuo Xu, Xiang Xiang
Vision-Language Foundation Models as Effective Robot Imitators
Xinghang Li, Minghuan Liu, Hanbo Zhang et al.
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
Juan Rocamonde, Victoriano Montesinos, Elvis Nava et al.
Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding
Taolin Zhang, Sunan He, Tao Dai et al.
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
Xiangxiang Chu, Jianlin Su, Bo Zhang et al.
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu, Bencheng Liao, Qian Zhang et al.
Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting
Zhicheng Wang, Liwen Xiao, Zhiguo Cao et al.