2025 "multi-modal large models" Papers
4 papers found
Audio-Visual Instance Segmentation
Ruohao Guo, Xianghua Ying, Yaru Chen et al.
CVPR 2025posterarXiv:2310.18709
11
citations
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Zichen Wen, Shaobo Wang, Yufa Zhou et al.
NEURIPS 2025posterarXiv:2510.00515
8
citations
Learning Interleaved Image-Text Comprehension in Vision-Language Large Models
Chenyu Zhou, Mengdan Zhang, Peixian Chen et al.
ICLR 2025posterarXiv:2406.10228
5
citations
MLVU: Benchmarking Multi-task Long Video Understanding
Junjie Zhou, Yan Shu, Bo Zhao et al.
CVPR 2025posterarXiv:2406.04264
93
citations