CVPR Papers
5,589 papers found • Page 110 of 112
Validating Privacy-Preserving Face Recognition under a Minimum Assumption
Hui Zhang, Xingbo Dong, YenLungLai et al.
Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes
Diandian Guo, Deng-Ping Fan, Tongyu Lu et al.
VAREN: Very Accurate and Realistic Equine Network
Silvia Zuffi, Ylva Mellbin, Ci Li et al.
VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction
Jiaqi Lin, Zhihao Li, Xiao Tang et al.
VBench: Comprehensive Benchmark Suite for Video Generative Models
Ziqi Huang, Yinan He, Jiashuo Yu et al.
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jitesh Jain, Jianwei Yang, Humphrey Shi
VecFusion: Vector Font Generation with Diffusion
Vikas Thamizharasan, Difan Liu, Shantanu Agarwal et al.
Vector Graphics Generation via Mutually Impulsed Dual-domain Diffusion
Zhongyin Zhao, Ye Chen, Zhangli Hu et al.
Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation
Xiaoyang Chen, Hao Zheng, Yuemeng LI et al.
Versatile Navigation Under Partial Observability via Value-guided Diffusion Policy
Gengyu Zhang, Hao Tang, Yan Yan
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Jianyuan Wang, Nikita Karaev, Christian Rupprecht et al.
V?: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu, Saining Xie
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani et al.
Video2Game: Real-time Interactive Realistic and Browser-Compatible Environment from a Single Video
Hongchi Xia, Chih-Hao Lin, Wei-Chiu Ma et al.
Video-Based Human Pose Regression via Decoupled Space-Time Aggregation
Jijie He, Wenwu Yang
VideoBooth: Diffusion-based Video Generation with Image Prompts
Yuming Jiang, Tianxing Wu, Shuai Yang et al.
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal, Yonatan Bitton, Idan Szpektor et al.
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Haoxin Chen, Yong Zhang, Xiaodong Cun et al.
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
XuDong Wang, Ishan Misra, Ziyun Zeng et al.
Video Frame Interpolation via Direct Synthesis with the Event-based Reference
Yuhan Liu, Yongjian Deng, Hao Chen et al.
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
Syed Talal Wasim, Muzammal Naseer, Salman Khan et al.
Video Harmonization with Triplet Spatio-Temporal Variation Patterns
Zonghui Guo, XinYu Han, Jie Zhang et al.
Video Interpolation with Diffusion Models
Siddhant Jain, Daniel Watson, Aleksander Holynski et al.
VideoLLM-online: Online Video Large Language Model for Streaming Video
Joya Chen, Zhaoyang Lv, Shiwei Wu et al.
VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei, Tao Chen, Xiruo Jiang et al.
Video-P2P: Video Editing with Cross-attention Control
Shaoteng Liu, Yuechen Zhang, Wenbo Li et al.
Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes
Gaurav Shrivastava, Abhinav Shrivastava
Video ReCap: Recursive Captioning of Hour-Long Videos
Md Mohaiminul Islam, Vu Bao Ngan Ho, Xitong Yang et al.
Video Recognition in Portrait Mode
Mingfei Han, Linjie Yang, Xiaojie Jin et al.
VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams
Liao Wang, Kaixin Yao, Chengcheng Guo et al.
Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
Xingyu Zhou, Leheng Zhang, Xiaorui Zhao et al.
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Yuchao Gu, Yipin Zhou, Bichen Wu et al.
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan et al.
vid-TLDR: Training Free Token Merging for Light-weight Video Transformer
Joonmyung Choi, Sanghyeok Lee, Jaewon Chu et al.
VidToMe: Video Token Merging for Zero-Shot Video Editing
Xirui Li, Chao Ma, Xiaokang Yang et al.
View-Category Interactive Sharing Transformer for Incomplete Multi-View Multi-Label Learning
Shilong Ou, Zhe Xue, Yawen Li et al.
View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network
Quan Zhang, Lei Wang, Vishal M. Patel et al.
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
Lukas Höllein, Aljaž Božič, Norman Müller et al.
View From Above: Orthogonal-View aware Cross-view Localization
Shan Wang, Chuong Nguyen, Jiawei Liu et al.
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
Xianghui Yang, Gil Avraham, Yan Zuo et al.
Viewpoint-Aware Visual Grounding in 3D Scenes
Xiangxi Shi, Zhonghua Wu, Stefan Lee
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification
Jiangbo Shi, Chen Li, Tieliang Gong et al.
VILA: On Pre-training for Visual Language Models
Ji Lin, Danny Yin, Wei Ping et al.
VINECS: Video-based Neural Character Skinning
Zhouyingcheng Liao, Vladislav Golyanik, Marc Habermann et al.
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Mu Cai, Haotian Liu, Siva Mustikovela et al.
Virtual Immunohistochemistry Staining for Histological Images Assisted by Weakly-supervised Learning
Jiahan Li, Jiuyang Dong, Shenjin Huang et al.
Vision-and-Language Navigation via Causal Learning
Liuyi Wang, Zongtao He, Ronghao Dang et al.
VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
Fan Ma, Xiaojie Jin, Heng Wang et al.
Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models
Daniel Geng, Inbum Park, Andrew Owens
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
Wenjin Hou, Shiming Chen, Shuhuang Chen et al.