rleak.com - Spot the Future of AI Research

#1

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Chaoyou Fu, Peixian Chen, Yunhang Shen et al.

NEURIPS 2025

1,237

citations

#2

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Yang Yue, Zhiqi Chen, Rui Lu et al.

NEURIPS 2025

483

citations

#3

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh vahid et al.

NEURIPS 2025

257

citations

#4

Video-R1: Reinforcing Video Reasoning in MLLMs

Kaituo Feng, Kaixiong Gong, Bohao Li et al.

NEURIPS 2025

236

citations

#5

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z Pan, Shuyi Yang et al.

NEURIPS 2025

188

citations

#6

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Xiaoxi Li, Jiajie Jin, Guanting Dong et al.

NEURIPS 2025

174

citations

#7

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Haozhe Wang, Chao Qu, Zuming Huang et al.

NEURIPS 2025

169

citations

#8

Training Language Models to Reason Efficiently

Daman Arora, Andrea Zanette

NEURIPS 2025

155

citations

#9

ToolRL: Reward is All Tool Learning Needs

Cheng Qian, Emre Can Acikgoz, Qi He et al.

NEURIPS 2025

152

citations

#10

Mean Flows for One-step Generative Modeling

Zhengyang Geng, Mingyang Deng, Xingjian Bai et al.

NEURIPS 2025

143

citations

#11

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Jonas Geiping, Sean McLeish, Neel Jain et al.

NEURIPS 2025

134

citations

#12

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Chaoyou Fu, Haojia Lin, Xiong Wang et al.

NEURIPS 2025

130

citations

#13

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Xun Huang, Zhengqi Li, Guande He et al.

NEURIPS 2025

123

citations

#14

TTRL: Test-Time Reinforcement Learning

Yuxin Zuo, Kaiyan Zhang, Li Sheng et al.

NEURIPS 2025

122

citations

#15

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Xeron Du, Yifan Yao, Kaijing Ma et al.

NEURIPS 2025

118

citations

#16

Rethinking Joint Maximum Mean Discrepancy for Visual Domain Adaptation

Wei Wang, Haifeng Xia, Chao Huang et al.

NEURIPS 2025

115

citations

#17

Improving Video Generation with Human Feedback

Jie Liu, Gongye Liu, Jiajun Liang et al.

NEURIPS 2025

112

citations

#18

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Huanjin Yao, Jiaxing Huang, Wenhao Wu et al.

NEURIPS 2025

102

citations

#19

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Yuhui Li, Fangyun Wei, Chao Zhang et al.

NEURIPS 2025

102

citations

#20

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Mingjie Liu, Shizhe Diao, Ximing Lu et al.

NEURIPS 2025

99

citations

NEURIPS

Top Papers in NEURIPS 2025

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Video-R1: Reinforcing Video Reasoning in MLLMs

Why Do Multi-Agent LLM Systems Fail?

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Training Language Models to Reason Efficiently

ToolRL: Reward is All Tool Learning Needs

Mean Flows for One-step Generative Modeling

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

TTRL: Test-Time Reinforcement Learning

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Rethinking Joint Maximum Mean Discrepancy for Visual Domain Adaptation

Improving Video Generation with Human Feedback

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models