Spotlight "multimodal large language models" Papers
3 papers found
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang, Yao Lai, Aoxue Li et al.
NeurIPS 2025spotlightarXiv:2505.20147
20
citations
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Chaoyou Fu, Haojia Lin, Xiong Wang et al.
NeurIPS 2025spotlightarXiv:2501.01957
130
citations
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Xing Han Lù, Zdeněk Kasner, Siva Reddy
ICML 2024spotlight