Spotlight "multimodal large language models" Papers
10 papers found
Conference
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Jingjing Jiang, Chongjie Si, Jun Luo et al.
NEURIPS 2025spotlightarXiv:2505.17534
5
citations
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang, Yao Lai, Aoxue Li et al.
NEURIPS 2025spotlightarXiv:2505.20147
20
citations
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
Fengxiang Wang, Mingshuo Chen, Yueying Li et al.
NEURIPS 2025spotlightarXiv:2505.21375
11
citations
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu, Peixian Chen, Yunhang Shen et al.
NEURIPS 2025spotlightarXiv:2306.13394
1255
citations
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Huanjin Yao, Jiaxing Huang, Wenhao Wu et al.
NEURIPS 2025spotlightarXiv:2412.18319
102
citations
ORIGAMISPACE: Benchmarking Multimodal LLMs in Multi-Step Spatial Reasoning with Mathematical Constraints
Rui Xu, Dakuan Lu, Zicheng Zhao et al.
NEURIPS 2025spotlightarXiv:2511.18450
2
citations
RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness
Fanhu Zeng, Haiyang Guo, Fei Zhu et al.
NEURIPS 2025spotlightarXiv:2502.17159
7
citations
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Chaoyou Fu, Haojia Lin, Xiong Wang et al.
NEURIPS 2025spotlightarXiv:2501.01957
130
citations
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Hyungjoo Chae, Seonghwan Kim, Junhee Cho et al.
NEURIPS 2025spotlightarXiv:2505.15277
8
citations
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Xing Han Lù, Zdeněk Kasner, Siva Reddy
ICML 2024spotlightarXiv:2402.05930