2025 "video captioning" Papers
9 papers found
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
Ruchit Rawal, Reza Shirkavand, Heng Huang et al.
ICCV 2025posterarXiv:2506.07371
3
citations
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.
ICLR 2025oralarXiv:2408.06072
1355
citations
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad, Vibhav Vineet, Yogesh S. Rawat
CVPR 2025posterarXiv:2503.08585
12
citations
HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation
Trong-Thuan Nguyen, Pha Nguyen, Jackson Cothren et al.
CVPR 2025posterarXiv:2411.18042
9
citations
Modeling dynamic social vision highlights gaps between deep learning and humans
Kathy Garcia, Emalie McMahon, Colin Conwell et al.
ICLR 2025poster
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Jang Hyun Cho, Andrea Madotto, Effrosyni Mavroudi et al.
NEURIPS 2025oralarXiv:2504.13180
40
citations
Progress-Aware Video Frame Captioning
Zihui Xue, Joungbin An, Xitong Yang et al.
CVPR 2025posterarXiv:2412.02071
7
citations
Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos
Mingfei Han, Linjie Yang, Xiaojun Chang et al.
ICLR 2025posterarXiv:2312.10300
46
citations
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
Wenhao Wang, Yi Yang
NEURIPS 2025posterarXiv:2503.01739
10
citations