"video-language models" Papers
8 papers found
Can Text-to-Video Generation help Video-Language Alignment?
Luca Zanella, Massimiliano Mancini, Willi Menapace et al.
CVPR 2025posterarXiv:2503.18507
1
citations
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
Yangliu Hu, Zikai Song, Na Feng et al.
CVPR 2025posterarXiv:2504.07745
11
citations
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
Haomiao Xiong, Zongxin Yang, Jiazuo Yu et al.
ICLR 2025posterarXiv:2501.13468
28
citations
Towards Understanding Camera Motions in Any Video
Zhiqiu Lin, Siyuan Cen, Daniel Jiang et al.
NeurIPS 2025spotlightarXiv:2504.15376
27
citations
VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
Yuxuan Wang, Yiqi Song, Cihang Xie et al.
ICCV 2025posterarXiv:2409.01071
3
citations
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu, Wilson Yan, Matei Zaharia et al.
ICLR 2025oralarXiv:2402.08268
144
citations
Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts
Yanting Yang, Minghao Chen, Qibo Qiu et al.
ECCV 2024posterarXiv:2407.14872
4
citations
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
Yang Liu, Pengxiang Ding, Siteng Huang et al.
ECCV 2024posterarXiv:2409.07239
9
citations