2025 Papers

21,856 papers found • Page 422 of 438

Filters:2025 Clear all

Conference

AAAI 2025 (3,028)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,140)oral (1,594)spotlight (1,421)highlight (975)

VideoOrion: Tokenizing Object Dynamics in Videos

Yicheng Feng, Yijiang Li, Wanpeng Zhang et al.

ICCV 2025posterarXiv:2411.16156

citations

Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models

Jinhui Yi, Syed Talal Wasim, Yanan Luo et al.

CVPR 2025posterarXiv:2412.18609

citations

Video Perception Models for 3D Scene Synthesis

Rui Huang, Guangyao Zhai, Zuria Bauer et al.

NEURIPS 2025posterarXiv:2506.20601

citations

VideoPhy: Evaluating Physical Commonsense for Video Generation

Hritik Bansal, Zongyu Lin, Tianyi Xie et al.

ICLR 2025posterarXiv:2406.03520

102

citations

Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations

Yucheng Hu, Yanjiang Guo, Pengchao Wang et al.

ICML 2025spotlightarXiv:2412.14803

Video-R1: Reinforcing Video Reasoning in MLLMs

Kaituo Feng, Kaixiong Gong, Bohao Li et al.

NEURIPS 2025oralarXiv:2503.21776

236

citations

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension

Yongdong Luo, Xiawu Zheng, Guilin Li et al.

NEURIPS 2025posterarXiv:2411.13093

citations

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Yuqian Yuan, Hang Zhang, Wentong Li et al.

CVPR 2025posterarXiv:2501.00599

citations

VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models

Xiangdong Zhang, Jiaqi Liao, Shaofeng Zhang et al.

NEURIPS 2025oralarXiv:2505.23656

Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

Yongliang Wu, Wenbo Zhu, Jiawang Cao et al.

AAAI 2025paperarXiv:2412.08879

VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling

Hyojun Go, Byeongjun Park, Hyelin Nam et al.

ICCV 2025posterarXiv:2503.15855

citations

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning

Qi Wang, Yanrui Yu, Ye Yuan et al.

NEURIPS 2025oralarXiv:2505.12434

citations

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

Xilin Wei, Xiaoran Liu, Yuhang Zang et al.

ICML 2025oralarXiv:2502.05173

Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs

Xuannan Liu, Zekun Li, Zheqi He et al.

NEURIPS 2025oralarXiv:2505.11842

citations

video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Guangzhi Sun, Yudong Yang, Jimin Zhuang et al.

ICML 2025posterarXiv:2502.11775

Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations

Xin Liu, Haoran Li, Dongbin Zhao

NEURIPS 2025posterarXiv:2512.21586

VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

Hanyang Wang, Fangfu Liu, Jiawei Chi et al.

CVPR 2025highlightarXiv:2504.01956

citations

VideoSetDiff: Identifying and Reasoning Similarities and Differences in Similar Videos

YUE QIU, Yanjun Sun, Takuma Yagi et al.

ICCV 2025poster

VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking

Runyi Hu, Jie Zhang, Yiming Li et al.

ICLR 2025oralarXiv:2501.14195

VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing

Juan Luis Gonzalez Bello, Xu Yao, Alex Whelan et al.

CVPR 2025posterarXiv:2504.07146

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Orr Zohar, Xiaohan Wang, Yonatan Bitton et al.

ICLR 2025posterarXiv:2407.06189

citations

Video Summarization Using Denoising Diffusion Probabilistic Model

Zirui Shang, Yubo Zhu, Hongxi Li et al.

AAAI 2025paperarXiv:2412.08357

Video Summarization with Large Language Models

Min Jung Lee, Dayoung Gong, Minsu Cho

CVPR 2025posterarXiv:2504.11199

citations

Video-T1: Test-time Scaling for Video Generation

Fangfu Liu, Hanyang Wang, Yimo Cai et al.

ICCV 2025posterarXiv:2503.18942

citations

VideoTitans: Scalable Video Prediction with Integrated Short- and Long-term Memory

Young-Jae Park, Minseok Seo, Hae-Gon Jeon

NEURIPS 2025poster

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin et al.

CVPR 2025posterarXiv:2405.19209

VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation

Wenhao Wang, Yi Yang

NEURIPS 2025posterarXiv:2503.01739

citations

VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE

Yazhou Xing, Yang Fei, Yingqing He et al.

ICCV 2025poster

VideoVLA: Video Generators Can Be Generalizable Robot Manipulators

Yichao Shen, Fangyun Wei, Zhiying Du et al.

NEURIPS 2025posterarXiv:2512.06963

citations

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

Lawrence Jang, Yinheng Li, Dan Zhao et al.

ICLR 2025posterarXiv:2410.19100

citations

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Zhongwei Ren, Yunchao Wei, Xun Guo et al.

CVPR 2025posterarXiv:2501.09781

citations

Video World Models with Long-term Spatial Memory

Tong Wu, Shuai Yang, Ryan Po et al.

NEURIPS 2025oralarXiv:2506.05284

citations

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding

Yan Shu, Zheng Liu, Peitian Zhang et al.

CVPR 2025posterarXiv:2409.14485

144

citations

VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos

Baoyu Liang, Qile Su, Shoutai Zhu et al.

AAAI 2025paperarXiv:2506.02448

citations

Vid-Group: Temporal Video Grounding Pretraining from Unlabeled Videos in the Wild

Peijun Bao, Chenqi Kong, SIYUAN YANG et al.

ICCV 2025poster

VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding

Chaoyu Li, Eun Woo Im, Pooyan Fazli

CVPR 2025posterarXiv:2412.03735

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

Tianchen Zhao, Tongcheng Fang, Haofeng Huang et al.

ICLR 2025posterarXiv:2406.02540

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.

CVPR 2025posterarXiv:2406.04321

citations

VidSeg: Training-free Video Semantic Segmentation based on Diffusion Models

Qian Wang, Abdelrahman Eldesokey, Mohit Mendiratta et al.

CVPR 2025poster

Vid-SME: Membership Inference Attacks against Large Video Understanding Models

Qi Li, Runpeng Yu, Xinchao Wang

NEURIPS 2025oralarXiv:2506.03179

citations

VidTwin: Video VAE with Decoupled Structure and Dynamics

Yuchi Wang, Junliang Guo, Xinyi Xie et al.

CVPR 2025posterarXiv:2412.17726

Vietnamese Words Are Not Constructed from Syllables: Rethinking the Role of Word Segmentation in Natural Language Processing for Vietnamese Texts

Nghia Hieu Nguyen, Dat Tien Nguyen, Ngan Luu-Thuy Nguyen

AAAI 2025paper

ViewCraft3D: High-fidelity and View-Consistent 3D Vector Graphics Synthesis

Chuang Wang, Haitao Zhou, Ling Luo et al.

NEURIPS 2025posterarXiv:2505.19492

citations

ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models

Zixun Fang, Kai Zhu, Zhiheng Liu et al.

NEURIPS 2025posterarXiv:2506.23513

Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning

Mi Luo, Zihui Xue, Alex Dimakis et al.

CVPR 2025poster

Viewpoint-Tolerant Depth Perception for Shared Extended Space Experience on Wall-Sized Display

Dooyoung Kim, Jinseok Hong, Heejeong Ko et al.

ISMAR 2025paperarXiv:2508.06889

ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition

Ronggang Huang, Haoxin Yang, Yan Cai et al.

ICCV 2025posterarXiv:2507.11261

View Transformation Robustness for Multi-View 3D Object Reconstruction with Reconstruction Error-Guided View Selection

Qi Zhang, Zhouhang Luo, Tao Yu et al.

AAAI 2025paperarXiv:2412.11428

citations

ViFactCheck: A New Benchmark Dataset and Methods for Multi-Domain News Fact-Checking In Vietnamese

Tran Thai Hoa, Tran Quang Duy, Khanh Quoc Tran et al.

AAAI 2025paperarXiv:2412.15308

VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset

Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam et al.

ICCV 2025poster

← Previous

1...420 421 422 423 424...438