2025 Papers
21,856 papers found • Page 422 of 438
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng, Yijiang Li, Wanpeng Zhang et al.
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models
Jinhui Yi, Syed Talal Wasim, Yanan Luo et al.
Video Perception Models for 3D Scene Synthesis
Rui Huang, Guangyao Zhai, Zuria Bauer et al.
VideoPhy: Evaluating Physical Commonsense for Video Generation
Hritik Bansal, Zongyu Lin, Tianyi Xie et al.
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Yucheng Hu, Yanjiang Guo, Pengchao Wang et al.
Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng, Kaixiong Gong, Bohao Li et al.
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Yongdong Luo, Xiawu Zheng, Guilin Li et al.
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Yuqian Yuan, Hang Zhang, Wentong Li et al.
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
Xiangdong Zhang, Jiaqi Liao, Shaofeng Zhang et al.
Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark
Yongliang Wu, Wenbo Zhu, Jiawang Cao et al.
VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
Hyojun Go, Byeongjun Park, Hyelin Nam et al.
VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
Qi Wang, Yanrui Yu, Ye Yuan et al.
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
Xilin Wei, Xiaoran Liu, Yuhang Zang et al.
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Xuannan Liu, Zekun Li, Zheqi He et al.
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Guangzhi Sun, Yudong Yang, Jimin Zhuang et al.
Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations
Xin Liu, Haoran Li, Dongbin Zhao
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Hanyang Wang, Fangfu Liu, Jiawei Chi et al.
VideoSetDiff: Identifying and Reasoning Similarities and Differences in Similar Videos
YUE QIU, Yanjun Sun, Takuma Yagi et al.
VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking
Runyi Hu, Jie Zhang, Yiming Li et al.
VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing
Juan Luis Gonzalez Bello, Xu Yao, Alex Whelan et al.
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Orr Zohar, Xiaohan Wang, Yonatan Bitton et al.
Video Summarization Using Denoising Diffusion Probabilistic Model
Zirui Shang, Yubo Zhu, Hongxi Li et al.
Video Summarization with Large Language Models
Min Jung Lee, Dayoung Gong, Minsu Cho
Video-T1: Test-time Scaling for Video Generation
Fangfu Liu, Hanyang Wang, Yimo Cai et al.
VideoTitans: Scalable Video Prediction with Integrated Short- and Long-term Memory
Young-Jae Park, Minseok Seo, Hae-Gon Jeon
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin et al.
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
Wenhao Wang, Yi Yang
VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE
Yazhou Xing, Yang Fei, Yingqing He et al.
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
Yichao Shen, Fangyun Wei, Zhiying Du et al.
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Lawrence Jang, Yinheng Li, Dan Zhao et al.
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Zhongwei Ren, Yunchao Wei, Xun Guo et al.
Video World Models with Long-term Spatial Memory
Tong Wu, Shuai Yang, Ryan Po et al.
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Yan Shu, Zheng Liu, Peitian Zhang et al.
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos
Baoyu Liang, Qile Su, Shoutai Zhu et al.
Vid-Group: Temporal Video Grounding Pretraining from Unlabeled Videos in the Wild
Peijun Bao, Chenqi Kong, SIYUAN YANG et al.
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding
Chaoyu Li, Eun Woo Im, Pooyan Fazli
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Tianchen Zhao, Tongcheng Fang, Haofeng Huang et al.
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.
VidSeg: Training-free Video Semantic Segmentation based on Diffusion Models
Qian Wang, Abdelrahman Eldesokey, Mohit Mendiratta et al.
Vid-SME: Membership Inference Attacks against Large Video Understanding Models
Qi Li, Runpeng Yu, Xinchao Wang
VidTwin: Video VAE with Decoupled Structure and Dynamics
Yuchi Wang, Junliang Guo, Xinyi Xie et al.
Vietnamese Words Are Not Constructed from Syllables: Rethinking the Role of Word Segmentation in Natural Language Processing for Vietnamese Texts
Nghia Hieu Nguyen, Dat Tien Nguyen, Ngan Luu-Thuy Nguyen
ViewCraft3D: High-fidelity and View-Consistent 3D Vector Graphics Synthesis
Chuang Wang, Haitao Zhou, Ling Luo et al.
ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models
Zixun Fang, Kai Zhu, Zhiheng Liu et al.
Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning
Mi Luo, Zihui Xue, Alex Dimakis et al.
Viewpoint-Tolerant Depth Perception for Shared Extended Space Experience on Wall-Sized Display
Dooyoung Kim, Jinseok Hong, Heejeong Ko et al.
ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition
Ronggang Huang, Haoxin Yang, Yan Cai et al.
View Transformation Robustness for Multi-View 3D Object Reconstruction with Reconstruction Error-Guided View Selection
Qi Zhang, Zhouhang Luo, Tao Yu et al.
ViFactCheck: A New Benchmark Dataset and Methods for Multi-Domain News Fact-Checking In Vietnamese
Tran Thai Hoa, Tran Quang Duy, Khanh Quoc Tran et al.
VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset
Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam et al.