NEURIPS 2025 "multi-modal large language models" Papers
10 papers found
EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis
Shengyuan Liu, Boyun Zheng, Wenting Chen et al.
NEURIPS 2025posterarXiv:2505.23601
9
citations
Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants
Lixiong Qin, Shilong Ou, Miaoxuan Zhang et al.
NEURIPS 2025posterarXiv:2501.01243
8
citations
FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning
Lu Zhang, Jiazuo Yu, Haomiao Xiong et al.
NEURIPS 2025posterarXiv:2510.21311
First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training
Lai Wei, Yuting Li, Chen Wang et al.
NEURIPS 2025posterarXiv:2505.22453
10
citations
Generative RLHF-V: Learning Principles from Multi-modal Human Preference
Jiayi Zhou, Jiaming Ji, Boyuan Chen et al.
NEURIPS 2025posterarXiv:2505.18531
7
citations
HOComp: Interaction-Aware Human-Object Composition
Dong Liang, Jinyuan Jia, Yuhao LIU et al.
NEURIPS 2025posterarXiv:2507.16813
Multi-step Visual Reasoning with Visual Tokens Scaling and Verification
Tianyi Bai, Zengjie Hu, Fupeng Sun et al.
NEURIPS 2025posterarXiv:2506.07235
11
citations
RadarQA: Multi-modal Quality Analysis of Weather Radar Forecasts
Xuming He, Zhiyuan You, Junchao Gong et al.
NEURIPS 2025posterarXiv:2508.12291
4
citations
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving
Christian Fruhwirth-Reisinger, Dušan Malić, Wei Lin et al.
NEURIPS 2025oralarXiv:2506.06218
4
citations
To Think or Not To Think: A Study of Thinking in Rule-Based Visual Reinforcement Fine-Tuning
Ming Li, Jike Zhong, Shitian Zhao et al.
NEURIPS 2025spotlight