2025 Spotlight "multimodal large language models" Papers
4 papers found
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang, Yao Lai, Aoxue Li et al.
NeurIPS 2025spotlightarXiv:2505.20147
20
citations
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
Fengxiang Wang, Mingshuo Chen, Yueying Li et al.
NeurIPS 2025spotlightarXiv:2505.21375
11
citations
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Chaoyou Fu, Haojia Lin, Xiong Wang et al.
NeurIPS 2025spotlightarXiv:2501.01957
130
citations
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Hyungjoo Chae, Seonghwan Kim, Junhee Cho et al.
NeurIPS 2025spotlightarXiv:2505.15277
8
citations