2025 "video-language models" Papers

10 papers found

Can Text-to-Video Generation help Video-Language Alignment?

Luca Zanella, Massimiliano Mancini, Willi Menapace et al.

CVPR 2025posterarXiv:2503.18507
1
citations

Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM

Han Wang, Yuxiang Nie, Yongjie Ye et al.

ICCV 2025posterarXiv:2412.09530
15
citations

ExpertAF: Expert Actionable Feedback from Video

Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos et al.

CVPR 2025posterarXiv:2408.00672
11
citations

Factorized Learning for Temporally Grounded Video-Language Models

Wenzheng Zeng, Difei Gao, Mike Zheng Shou et al.

ICCV 2025posterarXiv:2512.24097

SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Yangliu Hu, Zikai Song, Na Feng et al.

CVPR 2025posterarXiv:2504.07745
11
citations

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

Haomiao Xiong, Zongxin Yang, Jiazuo Yu et al.

ICLR 2025posterarXiv:2501.13468
28
citations

Towards Understanding Camera Motions in Any Video

Zhiqiu Lin, Siyuan Cen, Daniel Jiang et al.

NEURIPS 2025spotlightarXiv:2504.15376
27
citations

Two Causally Related Needles in a Video Haystack

Miaoyu Li, Qin Chao, Boyang Li

NEURIPS 2025posterarXiv:2505.19853

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges

Yuxuan Wang, Yiqi Song, Cihang Xie et al.

ICCV 2025posterarXiv:2409.01071
3
citations

World Model on Million-Length Video And Language With Blockwise RingAttention

Hao Liu, Wilson Yan, Matei Zaharia et al.

ICLR 2025oralarXiv:2402.08268
144
citations