NeurIPS Spotlight "multimodal large language models" Papers
7 papers found
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang, Yao Lai, Aoxue Li et al.
NeurIPS 2025spotlightarXiv:2505.20147
20
citations
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
Fengxiang Wang, Mingshuo Chen, Yueying Li et al.
NeurIPS 2025spotlightarXiv:2505.21375
11
citations
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu, Peixian Chen, Yunhang Shen et al.
NeurIPS 2025spotlightarXiv:2306.13394
1237
citations
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Huanjin Yao, Jiaxing Huang, Wenhao Wu et al.
NeurIPS 2025spotlightarXiv:2412.18319
102
citations
ORIGAMISPACE: Benchmarking Multimodal LLMs in Multi-Step Spatial Reasoning with Mathematical Constraints
Rui Xu, Dakuan Lu, Zicheng Zhao et al.
NeurIPS 2025spotlightarXiv:2511.18450
2
citations
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Chaoyou Fu, Haojia Lin, Xiong Wang et al.
NeurIPS 2025spotlightarXiv:2501.01957
130
citations
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Hyungjoo Chae, Seonghwan Kim, Junhee Cho et al.
NeurIPS 2025spotlightarXiv:2505.15277
8
citations