"multi-modal large models" Papers
3 papers found
Audio-Visual Instance Segmentation
Ruohao Guo, Xianghua Ying, Yaru Chen et al.
CVPR 2025posterarXiv:2310.18709
11
citations
Learning Interleaved Image-Text Comprehension in Vision-Language Large Models
Chenyu Zhou, Mengdan Zhang, Peixian Chen et al.
ICLR 2025posterarXiv:2406.10228
5
citations
MLVU: Benchmarking Multi-task Long Video Understanding
Junjie Zhou, Yan Shu, Bo Zhao et al.
CVPR 2025posterarXiv:2406.04264
93
citations