2025 "video question answering" Papers

19 papers found

Adaptive Keyframe Sampling for Long Video Understanding

Xi Tang, Jihao Qiu, Lingxi Xie et al.

CVPR 2025posterarXiv:2502.21271
68
citations

AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding

Xue zhucun, Jiangning Zhang, Xie Xurong et al.

NeurIPS 2025posterarXiv:2506.13589
7
citations

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

Zeyi Huang, Yuyang Ji, Xiaofang Wang et al.

CVPR 2025posterarXiv:2501.04336
7
citations

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

Xiaoyi Zhang, Zhaoyang Jia, Zongyu Guo et al.

NeurIPS 2025oralarXiv:2505.18079
17
citations

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha et al.

CVPR 2025posterarXiv:2502.04144
38
citations

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Joya Chen, Yiqi Lin, Ziyun Zeng et al.

CVPR 2025posterarXiv:2504.16030
4
citations

MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models

Hengzhi Li, Megan Tjandrasuwita, Yi R. (May) Fung et al.

NeurIPS 2025posterarXiv:2502.16671
7
citations

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Xuehai He, Weixi Feng, Kaizhi Zheng et al.

ICLR 2025posterarXiv:2406.08407
34
citations

MR. Video: MapReduce as an Effective Principle for Long Video Understanding

Ziqi Pang, Yu-Xiong Wang

NeurIPS 2025poster

Online Video Understanding: OVBench and VideoChat-Online

Zhenpeng Huang, Xinhao Li, Jiaqi Li et al.

CVPR 2025posterarXiv:2501.00584
9
citations

OSKAR: Omnimodal Self-supervised Knowledge Abstraction and Representation

Mohamed Abdelfattah, Kaouther Messaoud, Alexandre Alahi

NeurIPS 2025poster

Scaling RL to Long Videos

Yukang Chen, Wei Huang, Baifeng Shi et al.

NeurIPS 2025posterarXiv:2507.07966
38
citations

SEAL: Semantic Attention Learning for Long Video Representation

Lan Wang, Yujia Chen, Wen-Sheng Chu et al.

CVPR 2025posterarXiv:2412.01798
7
citations

Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

Mingfei Han, Linjie Yang, Xiaojun Chang et al.

ICLR 2025posterarXiv:2312.10300
46
citations

TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision

Ayush Gupta, Anirban Roy, Rama Chellappa et al.

ICCV 2025posterarXiv:2506.09445

Towards Understanding Camera Motions in Any Video

Zhiqiu Lin, Siyuan Cen, Daniel Jiang et al.

NeurIPS 2025spotlightarXiv:2504.15376
27
citations

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges

Yuxuan Wang, Yiqi Song, Cihang Xie et al.

ICCV 2025posterarXiv:2409.01071
3
citations

Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models

Jinhui Yi, Syed Talal Wasim, Yanan Luo et al.

CVPR 2025posterarXiv:2412.18609
1
citations

VITED: Video Temporal Evidence Distillation

Yujie Lu, Yale Song, Lorenzo Torresani et al.

CVPR 2025posterarXiv:2503.12855
2
citations