2025 "multimodal llms" Papers
7 papers found
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu, Jaehong Yoon, Mohit Bansal
ICLR 2025posterarXiv:2402.05889
16
citations
MINERVA: Evaluating Complex Video Reasoning
Arsha Nagrani, Sachit Menon, Ahmet Iscen et al.
ICCV 2025posterarXiv:2505.00681
9
citations
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Fei Wang, XINGYU FU, James Y. Huang et al.
ICLR 2025oralarXiv:2406.09411
113
citations
NeedleInATable: Exploring Long-Context Capability of Large Language Models towards Long-Structured Tables
Lanrui Wang, Mingyu Zheng, Hongyin Tang et al.
NEURIPS 2025posterarXiv:2504.06560
3
citations
Passing the Driving Knowledge Test
Maolin Wei, Wanzhou Liu, Eshed Ohn-Bar
ICCV 2025posterarXiv:2508.21824
1
citations
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Zhengfeng Lai, Vasileios Saveris, Chen Chen et al.
ICLR 2025posterarXiv:2410.02740
9
citations
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu, Hao Fei, Xiangtai Li et al.
ICLR 2025posterarXiv:2406.05127
58
citations