CVPR 2025 "video large language models" Papers
6 papers found
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Rui Qian, Shuangrui Ding, Xiaoyi Dong et al.
CVPR 2025posterarXiv:2501.03218
31
citations
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Joya Chen, Yiqi Lin, Ziyun Zeng et al.
CVPR 2025posterarXiv:2504.16030
4
citations
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
Tiantian Geng, Jinrui Zhang, Qingni Wang et al.
CVPR 2025posterarXiv:2411.19772
32
citations
PAVE: Patching and Adapting Video Large Language Models
Zhuoming Liu, Yiquan Li, Khoi D Nguyen et al.
CVPR 2025posterarXiv:2503.19794
1
citations
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding
Andong Deng, Zhongpai Gao, Anwesa Choudhuri et al.
CVPR 2025posterarXiv:2411.16932
6
citations
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Yuqian Yuan, Hang Zhang, Wentong Li et al.
CVPR 2025posterarXiv:2501.00599
41
citations