CVPR 2025 "multi-modal large language models" Papers
3 papers found
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
Zhihang Liu, Chen-Wei Xie, Pandeng Li et al.
CVPR 2025posterarXiv:2503.16036
14
citations
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Yan Shu, Zheng Liu, Peitian Zhang et al.
CVPR 2025posterarXiv:2409.14485
144
citations
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Zhengrong Yue, Shaobin Zhuang, Kunchang Li et al.
CVPR 2025posterarXiv:2503.12077
5
citations