2025 Papers

21,856 papers found • Page 421 of 438

ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs

Michal Nazarczuk, Sibi Catley-Chandar, Thomas Tanay et al.

NEURIPS 2025oralarXiv:2506.18792

VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation

Hanzhi Chen, Boyang Sun, Anran Zhang et al.

CVPR 2025posterarXiv:2503.07135
29
citations

VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning

Ji Soo Lee, Jongha Kim, Jeehye Na et al.

AAAI 2025paperarXiv:2501.06761
7
citations

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

Yunlong Tang, JunJia Guo, Hang Hua et al.

CVPR 2025posterarXiv:2411.10979
16
citations

VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models

Zhicheng Zhang, Weicheng Wang, Yongjie Zhu et al.

NEURIPS 2025posterarXiv:2511.02712

Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization

Hao Ju, Shaofei Huang, Si Liu et al.

ICCV 2025posterarXiv:2411.13610

Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

Duo Zheng, Shijia Huang, Liwei Wang

CVPR 2025posterarXiv:2412.00493
65
citations

Video Action Differencing

James Burgess, Xiaohan Wang, Yuhui Zhang et al.

ICLR 2025posterarXiv:2503.07860

VideoAds for Fast-Paced Video Understanding

Zheyuan Zhang, Wanying Dou, Linkai Peng et al.

ICCV 2025posterarXiv:2504.09282
2
citations

Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model

Hang Zhou, Jiale Cai, Yuteng Ye et al.

AAAI 2025paperarXiv:2412.09026
14
citations

VideoAuteur: Towards Long Narrative Video Generation

Junfei Xiao, Feng Cheng, Lu Qi et al.

ICCV 2025posterarXiv:2501.06173

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

Ziyang Luo, Haoning Wu, Dongxu Li et al.

CVPR 2025posterarXiv:2411.13281
14
citations

Video-Bench: Human-Aligned Video Generation Benchmark

Hui Han, Siyuan Li, Jiaqi Chen et al.

CVPR 2025posterarXiv:2504.04907

VideoCAD: A Dataset and Model for Learning Long‑Horizon 3D CAD UI Interactions from Video

King Yiu Brandon Man, Ghadi Nehme, Md Ferdous Alam et al.

NEURIPS 2025posterarXiv:2505.24838

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

Ziang Yan, Yinan He, Xinhao Li et al.

NEURIPS 2025oralarXiv:2509.21100
13
citations

Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval

Arun Reddy, Alexander Martin, Eugene Yang et al.

CVPR 2025posterarXiv:2503.19009
9
citations

Video Color Grading via Look-Up Table Generation

Seunghyun Shin, Dongmin Shin, Jisu Shin et al.

ICCV 2025posterarXiv:2508.00548
1
citations

VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models

Dahun Kim, AJ Piergiovanni, Ganesh Satish Mallya et al.

CVPR 2025posterarXiv:2504.03970

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Sili Chen, Hengkai Guo, Shengnan Zhu et al.

CVPR 2025highlightarXiv:2501.12375

Video Depth without Video Models

Bingxin Ke, Dominik Narnhofer, Shengyu Huang et al.

CVPR 2025posterarXiv:2411.19189
20
citations

Video Diffusion Models Are Strong Video Inpainter

Minhyeok Lee, Suhwan Cho, Chajin Shin et al.

AAAI 2025paperarXiv:2408.11402
14
citations

Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision

Chenshuang Zhang, Kang Zhang, Joon Son Chung et al.

NEURIPS 2025posterarXiv:2512.02339

VideoDirector: Precise Video Editing via Text-to-Video Models

Yukun Wang, Longguang Wang, Zhiyuan Ma et al.

CVPR 2025posterarXiv:2411.17592

VideoDPO: Omni-Preference Alignment for Video Diffusion Generation

Runtao Liu, Haoyu Wu, Zheng Ziqiang et al.

CVPR 2025posterarXiv:2412.14167
68
citations

VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Yabo Zhang, Yuxiang Wei, Xianhui Lin et al.

AAAI 2025paperarXiv:2403.05438

Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach

Minting Pan, Yitao Zheng, Jiajian Li et al.

ICML 2025posterarXiv:2505.06482

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Songhao Han, Wei Huang, Hairong Shi et al.

CVPR 2025posterarXiv:2411.14794
54
citations

VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance

Mohammad Reza Taesiri, Abhijay Ghildyal, Saman Zadtootaghaj et al.

NEURIPS 2025posterarXiv:2505.15952
4
citations

VideoGEM: Training-free Action Grounding in Videos

Felix Vogel, Walid Bousselham, Anna Kukleva et al.

CVPR 2025posterarXiv:2503.20348

VideoGigaGAN: Towards Detail-rich Video Super-Resolution

Yiran Xu, Taesung Park, Richard Zhang et al.

CVPR 2025posterarXiv:2404.12388

VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Shehan Munasinghe, Hanan Gani, Wenqi Zhu et al.

CVPR 2025posterarXiv:2411.04923
30
citations

VideoGLUE: Video General Understanding Evaluation of Foundation Models

Boqing Gong, Yin Cui, Long Zhao et al.

ICLR 2025oral

VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing

Xiangpeng Yang, Linchao Zhu, Hehe Fan et al.

ICLR 2025posterarXiv:2502.17258
31
citations

Video-Guided Foley Sound Generation with Multimodal Controls

Ziyang Chen, Prem Seetharaman, Bryan Russell et al.

CVPR 2025posterarXiv:2411.17698
38
citations

VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

Dohun Lee, Bryan Sangwoo Kim, Geon Yeong Park et al.

CVPR 2025posterarXiv:2410.04364
2
citations

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding

Zongxia Li, Xiyang Wu, Guangyao Shi et al.

NEURIPS 2025posterarXiv:2505.01481
13
citations

VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors

Juil Koo, Paul Guerrero, Chun-Hao P. Huang et al.

CVPR 2025posterarXiv:2503.01107
4
citations

VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding

Kangsan Kim, Geon Park, Youngwan Lee et al.

CVPR 2025posterarXiv:2412.02186
12
citations

Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators

Wentao Zhang, Junliang Guo, Tianyu He et al.

ICLR 2025posterarXiv:2407.07356
7
citations

Video Individual Counting for Moving Drones

Yaowu Fan, Jia Wan, Tao Han et al.

ICCV 2025highlightarXiv:2503.10701
3
citations

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

Hila Chefer, Uriel Singer, Amit Zohar et al.

ICML 2025oralarXiv:2502.02492

Video Language Model Pretraining with Spatio-temporal Masking

Yue Wu, Zhaobo Qi, Junshu Sun et al.

CVPR 2025poster
1
citations

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges

Yuxuan Wang, Yiqi Song, Cihang Xie et al.

ICCV 2025posterarXiv:2409.01071
3
citations

VideoLucy: Deep Memory Backtracking for Long Video Understanding

Jialong Zuo, Yongtai Deng, Lingdong Kong et al.

NEURIPS 2025oralarXiv:2510.12422
2
citations

VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models

Chi-Pin Huang, Yen-Siang Wu, Hung-Kai Chung et al.

CVPR 2025posterarXiv:2503.21781
7
citations

VideoMAR: Autoregressive Video Generation with Continuous Tokens

Hu Yu, Biao Gong, Hangjie Yuan et al.

NEURIPS 2025oral
8
citations

VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization

Xinye Cao, Hongcan Guo, Jiawen Qian et al.

ICCV 2025posterarXiv:2510.06040

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Chaoyou Fu, Yuhan Dai, Yongdong Luo et al.

CVPR 2025highlightarXiv:2405.21075
876
citations

Video Motion Graphs

Haiyang Liu, Zhan Xu, Fating Hong et al.

ICCV 2025highlightarXiv:2503.20218
4
citations

Video Motion Transfer with Diffusion Transformers

Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.

CVPR 2025posterarXiv:2412.07776
18
citations