CVPR "multi-modal large language models" Papers
4 papers found
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation
Lang Lin, Xueyang Yu, Ziqi Pang et al.
CVPR 2025posterarXiv:2504.07962
21
citations
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
Zhihang Liu, Chen-Wei Xie, Pandeng Li et al.
CVPR 2025posterarXiv:2503.16036
14
citations
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Yan Shu, Zheng Liu, Peitian Zhang et al.
CVPR 2025posterarXiv:2409.14485
144
citations
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Zhengrong Yue, Shaobin Zhuang, Kunchang Li et al.
CVPR 2025posterarXiv:2503.12077
5
citations