2025 "video-language models" Papers
10 papers found
Can Text-to-Video Generation help Video-Language Alignment?
Luca Zanella, Massimiliano Mancini, Willi Menapace et al.
CVPR 2025posterarXiv:2503.18507
1
citations
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
Han Wang, Yuxiang Nie, Yongjie Ye et al.
ICCV 2025posterarXiv:2412.09530
15
citations
ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos et al.
CVPR 2025posterarXiv:2408.00672
11
citations
Factorized Learning for Temporally Grounded Video-Language Models
Wenzheng Zeng, Difei Gao, Mike Zheng Shou et al.
ICCV 2025posterarXiv:2512.24097
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
Yangliu Hu, Zikai Song, Na Feng et al.
CVPR 2025posterarXiv:2504.07745
11
citations
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
Haomiao Xiong, Zongxin Yang, Jiazuo Yu et al.
ICLR 2025posterarXiv:2501.13468
28
citations
Towards Understanding Camera Motions in Any Video
Zhiqiu Lin, Siyuan Cen, Daniel Jiang et al.
NEURIPS 2025spotlightarXiv:2504.15376
27
citations
Two Causally Related Needles in a Video Haystack
Miaoyu Li, Qin Chao, Boyang Li
NEURIPS 2025posterarXiv:2505.19853
VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
Yuxuan Wang, Yiqi Song, Cihang Xie et al.
ICCV 2025posterarXiv:2409.01071
3
citations
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu, Wilson Yan, Matei Zaharia et al.
ICLR 2025oralarXiv:2402.08268
144
citations