NEURIPS 2025 "multi-modal large language models" Papers

10 papers found

EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis

Shengyuan Liu, Boyun Zheng, Wenting Chen et al.

NEURIPS 2025posterarXiv:2505.23601
9
citations

Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants

Lixiong Qin, Shilong Ou, Miaoxuan Zhang et al.

NEURIPS 2025posterarXiv:2501.01243
8
citations

FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning

Lu Zhang, Jiazuo Yu, Haomiao Xiong et al.

NEURIPS 2025posterarXiv:2510.21311

First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training

Lai Wei, Yuting Li, Chen Wang et al.

NEURIPS 2025posterarXiv:2505.22453
10
citations

Generative RLHF-V: Learning Principles from Multi-modal Human Preference

Jiayi Zhou, Jiaming Ji, Boyuan Chen et al.

NEURIPS 2025posterarXiv:2505.18531
7
citations

HOComp: Interaction-Aware Human-Object Composition

Dong Liang, Jinyuan Jia, Yuhao LIU et al.

NEURIPS 2025posterarXiv:2507.16813

Multi-step Visual Reasoning with Visual Tokens Scaling and Verification

Tianyi Bai, Zengjie Hu, Fupeng Sun et al.

NEURIPS 2025posterarXiv:2506.07235
11
citations

RadarQA: Multi-modal Quality Analysis of Weather Radar Forecasts

Xuming He, Zhiyuan You, Junchao Gong et al.

NEURIPS 2025posterarXiv:2508.12291
4
citations

STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving

Christian Fruhwirth-Reisinger, Dušan Malić, Wei Lin et al.

NEURIPS 2025oralarXiv:2506.06218
4
citations

To Think or Not To Think: A Study of Thinking in Rule-Based Visual Reinforcement Fine-Tuning

Ming Li, Jike Zhong, Shitian Zhao et al.

NEURIPS 2025spotlight