Most Cited NEURIPS "zero-shot policy transfer" Papers

5,858 papers found • Page 1 of 30

#1

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Chaoyou Fu, Peixian Chen, Yunhang Shen et al.

NEURIPS 2025spotlightarXiv:2306.13394
1277
citations
#2

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Qiying Yu, Zheng Zhang, Ruofei Zhu et al.

NEURIPS 2025arXiv:2503.14476
1211
citations
#3

YOLOv12: Attention-Centric Real-Time Object Detectors

Yunjie Tian, Qixiang Ye, DAVID DOERMANN

NEURIPS 2025arXiv:2502.12524
938
citations
#4

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Yang Yue, Zhiqi Chen, Rui Lu et al.

NEURIPS 2025oralarXiv:2504.13837
540
citations
#5

Gymnasium: A Standard Interface for Reinforcement Learning Environments

Mark Towers, Ariel Kwiatkowski, John Balis et al.

NEURIPS 2025spotlightarXiv:2407.17032
534
citations
#6

Large Language Diffusion Models

Shen Nie, Fengqi Zhu, Zebin You et al.

NEURIPS 2025oralarXiv:2502.09992
403
citations
#7

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Jingcheng Hu, Yinmin Zhang, Qi Han et al.

NEURIPS 2025arXiv:2503.24290
347
citations
#8

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Shenzhi Wang, Le Yu, Chang Gao et al.

NEURIPS 2025arXiv:2506.01939
305
citations
#9

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh vahid et al.

NEURIPS 2025arXiv:2506.06941
277
citations
#10

Video-R1: Reinforcing Video Reasoning in MLLMs

Kaituo Feng, Kaixiong Gong, Bohao Li et al.

NEURIPS 2025oralarXiv:2503.21776
257
citations
#11

A-Mem: Agentic Memory for LLM Agents

Wujiang Xu, Zujie Liang, Kai Mei et al.

NEURIPS 2025arXiv:2502.12110
250
citations
#12

Flow-GRPO: Training Flow Matching Models via Online RL

Jie Liu, Gongye Liu, Jiajun Liang et al.

NEURIPS 2025arXiv:2505.05470
221
citations
#13

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z Pan, Shuyi Yang et al.

NEURIPS 2025spotlightarXiv:2503.13657
204
citations
#14

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Xiaoxi Li, Jiajie Jin, Guanting Dong et al.

NEURIPS 2025arXiv:2504.21776
198
citations
#15

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Yiping Wang, Qing Yang, Zhiyuan Zeng et al.

NEURIPS 2025arXiv:2504.20571
190
citations
#16

Mean Flows for One-step Generative Modeling

Zhengyang Geng, Mingyang Deng, Xingjian Bai et al.

NEURIPS 2025oralarXiv:2505.13447
185
citations
#17

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Haozhe Wang, Chao Qu, Zuming Huang et al.

NEURIPS 2025spotlightarXiv:2504.08837
183
citations
#18

ToolRL: Reward is All Tool Learning Needs

Cheng Qian, Emre Can Acikgoz, Qi He et al.

NEURIPS 2025oralarXiv:2504.13958
178
citations
#19

Training Language Models to Reason Efficiently

Daman Arora, Andrea Zanette

NEURIPS 2025arXiv:2502.04463
178
citations
#20

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Jonas Geiping, Sean McLeish, Neel Jain et al.

NEURIPS 2025spotlightarXiv:2502.05171
158
citations
#21

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Yuxiang Wei, Olivier Duchenne, Jade Copet et al.

NEURIPS 2025arXiv:2502.18449
156
citations
#22

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Andrew Zhao, Yiran Wu, Yang Yue et al.

NEURIPS 2025spotlightarXiv:2505.03335
147
citations
#23

Titans: Learning to Memorize at Test Time

Ali Behrouz, Peilin Zhong, Vahab Mirrokni

NEURIPS 2025arXiv:2501.00663
147
citations
#24

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Xun Huang, Zhengqi Li, Guande He et al.

NEURIPS 2025spotlightarXiv:2506.08009
145
citations
#25

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Chaoyou Fu, Haojia Lin, Xiong Wang et al.

NEURIPS 2025spotlightarXiv:2501.01957
138
citations
#26

MMaDA: Multimodal Large Diffusion Language Models

Ling Yang, Ye Tian, Bowen Li et al.

NEURIPS 2025arXiv:2505.15809
135
citations
#27

Perception Encoder: The best visual embeddings are not at the output of the network

Daniel Bolya, Po-Yao Huang, Peize Sun et al.

NEURIPS 2025oralarXiv:2504.13181
129
citations
#28

Learning to Reason under Off-Policy Guidance

Jianhao Yan, Yafu Li, Zican Hu et al.

NEURIPS 2025arXiv:2504.14945
129
citations
#29

TTRL: Test-Time Reinforcement Learning

Yuxin Zuo, Kaiyan Zhang, Li Sheng et al.

NEURIPS 2025arXiv:2504.16084
129
citations
#30

Improving Video Generation with Human Feedback

Jie Liu, Gongye Liu, Jiajun Liang et al.

NEURIPS 2025arXiv:2501.13918
127
citations
#31

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Xeron Du, Yifan Yao, Kaijing Ma et al.

NEURIPS 2025arXiv:2502.14739
118
citations
#32

AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Wei Fu, Jiaxuan Gao, Xujie Shen et al.

NEURIPS 2025arXiv:2505.24298
117
citations
#33

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Yuhui Li, Fangyun Wei, Chao Zhang et al.

NEURIPS 2025arXiv:2503.01840
115
citations
#34

Rethinking Joint Maximum Mean Discrepancy for Visual Domain Adaptation

Wei Wang, Haifeng Xia, Chao Huang et al.

NEURIPS 2025oral
115
citations
#35

MoBA: Mixture of Block Attention for Long-Context LLMs

Enzhe Lu, Zhejun Jiang, Jingyuan Liu et al.

NEURIPS 2025spotlightarXiv:2502.13189
109
citations
#36

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Huanjin Yao, Jiaxing Huang, Wenhao Wu et al.

NEURIPS 2025spotlightarXiv:2412.18319
106
citations
#37

Show-o2: Improved Native Unified Multimodal Models

Jinheng Xie, Zhenheng Yang, Mike Zheng Shou

NEURIPS 2025oralarXiv:2506.15564
106
citations
#38

Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference

Yuan Feng, Junlin Lv, Yukun Cao et al.

NEURIPS 2025arXiv:2407.11550
106
citations
#39

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Frank (Fangzheng) Xu, Yufan Song, Boxuan Li et al.

NEURIPS 2025arXiv:2412.14161
105
citations
#40

Group-in-Group Policy Optimization for LLM Agent Training

Lang Feng, Zhenghai Xue, Tingcong Liu et al.

NEURIPS 2025arXiv:2505.10978
105
citations
#41

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Mingjie Liu, Shizhe Diao, Ximing Lu et al.

NEURIPS 2025arXiv:2505.24864
104
citations
#42

Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning

Wenkai Yang, Shuming Ma, Yankai Lin et al.

NEURIPS 2025arXiv:2502.18080
103
citations
#43

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

Shivam Agarwal, Zimin Zhang, Lifan Yuan et al.

NEURIPS 2025arXiv:2505.15134
102
citations
#44

Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

Zechuan Zhang, Ji Xie, Yu Lu et al.

NEURIPS 2025arXiv:2504.20690
100
citations
#45

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Dongzhi JIANG, Ziyu Guo, Renrui Zhang et al.

NEURIPS 2025arXiv:2505.00703
100
citations
#46

ImgEdit: A Unified Image Editing Dataset and Benchmark

Yang Ye, Xianyi He, Zongjian Li et al.

NEURIPS 2025arXiv:2505.20275
98
citations
#47

WebDancer: Towards Autonomous Information Seeking Agency

Jialong Wu, Baixuan Li, Runnan Fang et al.

NEURIPS 2025arXiv:2505.22648
98
citations
#48

Remarkable Robustness of LLMs: Stages of Inference?

Vedang Lad, Jin Hwa Lee, Wes Gurnee et al.

NEURIPS 2025oralarXiv:2406.19384
95
citations
#49

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models

Sreyan Ghosh, Arushi Goel, Jaehyeon Kim et al.

NEURIPS 2025spotlightarXiv:2507.08128
94
citations
#50

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Di Liu, Meng Chen, Baotong Lu et al.

NEURIPS 2025arXiv:2409.10516
90
citations
#51

Remasking Discrete Diffusion Models with Inference-Time Scaling

Guanghan Wang, Yair Schiff, Subham Sahoo et al.

NEURIPS 2025arXiv:2503.00307
90
citations
#52

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-based Decoding

Xiner Li, Yulai Zhao, Chenyu Wang et al.

NEURIPS 2025arXiv:2408.08252
90
citations
#53

SWE-smith: Scaling Data for Software Engineering Agents

John Yang, Kilian Lieret, Carlos Jimenez et al.

NEURIPS 2025spotlightarXiv:2504.21798
89
citations
#54

Scalable Best-of-N Selection for Large Language Models via Self-Certainty

Zhewei Kang, Xuandong Zhao, Dawn Song

NEURIPS 2025arXiv:2502.18581
89
citations
#55

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.

NEURIPS 2025arXiv:2506.01347
89
citations
#56

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Siyan Zhao, Devaansh Gupta, Qinqing Zheng et al.

NEURIPS 2025spotlightarXiv:2504.12216
87
citations
#57

OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

Mengkang Hu, Yuhang Zhou, Wendong Fan et al.

NEURIPS 2025arXiv:2505.23885
87
citations
#58

General-Reasoner: Advancing LLM Reasoning Across All Domains

Xueguang Ma, Qian Liu, Dongfu Jiang et al.

NEURIPS 2025arXiv:2505.14652
86
citations
#59

LMFusion: Adapting Pretrained Language Models for Multimodal Generation

Weijia Shi, Xiaochuang Han, Chunting Zhou et al.

NEURIPS 2025arXiv:2412.15188
86
citations
#60

FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving

Shuang Zeng, Xinyuan Chang, Mengwei Xie et al.

NEURIPS 2025oralarXiv:2505.17685
84
citations
#61

APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

Akshara Prabhakar, Zuxin Liu, Ming Zhu et al.

NEURIPS 2025arXiv:2504.03601
82
citations
#62

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

Xiyao Wang, Zhengyuan Yang, Chao Feng et al.

NEURIPS 2025spotlightarXiv:2504.07934
81
citations
#63

A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

David Chanin, James Wilken-Smith, Tomáš Dulka et al.

NEURIPS 2025oralarXiv:2409.14507
81
citations
#64

Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

Chongyu Fan, Jiancheng Liu, Licong Lin et al.

NEURIPS 2025arXiv:2410.07163
81
citations
#65

UniTok: a Unified Tokenizer for Visual Generation and Understanding

Chuofan Ma, Yi Jiang, Junfeng Wu et al.

NEURIPS 2025spotlightarXiv:2502.20321
79
citations
#66

dKV-Cache: The Cache for Diffusion Language Models

Xinyin Ma, Runpeng Yu, Gongfan Fang et al.

NEURIPS 2025arXiv:2505.15781
79
citations
#67

Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization

Qingyang Zhang, Haitao Wu, Changqing Zhang et al.

NEURIPS 2025spotlightarXiv:2504.05812
78
citations
#68

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Zihan Qiu, Zekun Wang, Bo Zheng et al.

NEURIPS 2025oralarXiv:2505.06708
77
citations
#69

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Diankun Wu, Fangfu Liu, Yi-Hsin Hung et al.

NEURIPS 2025spotlightarXiv:2505.23747
77
citations
#70

What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions

Sang Choe, Hwijeen Ahn, Juhan Bae et al.

NEURIPS 2025arXiv:2405.13954
76
citations
#71

AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

Zewei Zhou, Tianhui Cai, Seth Zhao et al.

NEURIPS 2025arXiv:2506.13757
75
citations
#72

UMA: A Family of Universal Models for Atoms

Brandon Wood, Misko Dzamba, Xiang Fu et al.

NEURIPS 2025spotlightarXiv:2506.23971
75
citations
#73

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension

Yongdong Luo, Xiawu Zheng, Guilin Li et al.

NEURIPS 2025arXiv:2411.13093
73
citations
#74

Offline Actor-Critic for Average Reward MDPs

William Powell, Jeongyeol Kwon, Qiaomin Xie et al.

NEURIPS 2025
73
citations
#75

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Ling Fu, Zhebin Kuang, Jiajun Song et al.

NEURIPS 2025arXiv:2501.00321
73
citations
#76

MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details

Ruicheng Wang, Sicheng Xu, Yue Dong et al.

NEURIPS 2025arXiv:2507.02546
72
citations
#77

LoRA vs Full Fine-tuning: An Illusion of Equivalence

Reece Shuttleworth, Jacob Andreas, Antonio Torralba et al.

NEURIPS 2025arXiv:2410.21228
70
citations
#78

Thinkless: LLM Learns When to Think

Gongfan Fang, Xinyin Ma, Xinchao Wang

NEURIPS 2025arXiv:2505.13379
70
citations
#79

Q-Insight: Understanding Image Quality via Visual Reinforcement Learning

Weiqi Li, Xuanyu Zhang, Shijie Zhao et al.

NEURIPS 2025spotlightarXiv:2503.22679
62
citations
#80

CSGO: Content-Style Composition in Text-to-Image Generation

Peng Xing, Haofan Wang, Yanpeng Sun et al.

NEURIPS 2025arXiv:2408.16766
62
citations
#81

Perception-R1: Pioneering Perception Policy with Reinforcement Learning

En Yu, Kangheng Lin, Liang Zhao et al.

NEURIPS 2025arXiv:2504.07954
62
citations
#82

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Wenyao Zhang, Hongsi Liu, Zekun Qi et al.

NEURIPS 2025arXiv:2507.04447
61
citations
#83

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

Daoguang Zan, Zhirong Huang, Wei Liu et al.

NEURIPS 2025arXiv:2504.02605
61
citations
#84

GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing

Rongyao Fang, Chengqi Duan, Kun Wang et al.

NEURIPS 2025
60
citations
#85

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Tianbao Xie, Jiaqi Deng, Xiaochuan Li et al.

NEURIPS 2025spotlightarXiv:2505.13227
60
citations
#86

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Zhaorun Chen, Zichen Wen, Yichao Du et al.

NEURIPS 2025arXiv:2407.04842
60
citations
#87

GRIT: Teaching MLLMs to Think with Images

Yue Fan, Xuehai He, Diji Yang et al.

NEURIPS 2025arXiv:2505.15879
59
citations
#88

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Chi-Pin Huang, Yueh-Hua Wu, Min-Hung Chen et al.

NEURIPS 2025arXiv:2507.16815
59
citations
#89

Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models

Lvmin Zhang, Shengqu Cai, Muyang Li et al.

NEURIPS 2025spotlightarXiv:2504.12626
59
citations
#90

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi et al.

NEURIPS 2025arXiv:2506.04308
58
citations
#91

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Ziyang Ma, Yinghao Ma, Yanqiao Zhu et al.

NEURIPS 2025arXiv:2505.13032
57
citations
#92

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Mingyang Chen, Linzhuang Sun, Tianpeng Li et al.

NEURIPS 2025arXiv:2503.19470
57
citations
#93

NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

Xiangyan Liu, Jinjie Ni, Zijian Wu et al.

NEURIPS 2025arXiv:2504.13055
57
citations
#94

CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

Zhihang Lin, Mingbao Lin, Yuan Xie et al.

NEURIPS 2025arXiv:2503.22342
56
citations
#95

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing

Junfei Wu, Jian Guan, Kaituo Feng et al.

NEURIPS 2025arXiv:2506.09965
54
citations
#96

WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks

Ivan Evtimov, Arman Zharmagambetov, Aaron Grattafiori et al.

NEURIPS 2025arXiv:2504.18575
54
citations
#97

Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking

Heli Ben-Hamu, Itai Gat, Daniel Severo et al.

NEURIPS 2025arXiv:2505.24857
54
citations
#98

Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better

Danny Driess, Jost Springenberg, Brian Ichter et al.

NEURIPS 2025spotlightarXiv:2505.23705
53
citations
#99

OmniBench: Towards The Future of Universal Omni-Language Models

Yizhi Li, Ge Zhang, Yinghao Ma et al.

NEURIPS 2025arXiv:2409.15272
53
citations
#100

WorldMem: Long-term Consistent World Simulation with Memory

Zeqi Xiao, Yushi LAN, Yifan Zhou et al.

NEURIPS 2025oralarXiv:2504.12369
53
citations
#101

What Makes a Reward Model a Good Teacher? An Optimization Perspective

Noam Razin, Zixuan Wang, Hubert Strauss et al.

NEURIPS 2025spotlightarXiv:2503.15477
53
citations
#102

S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models

Muzhi Dai, Chenxu Yang, Qingyi Si

NEURIPS 2025oralarXiv:2505.07686
52
citations
#103

GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning

jusheng zhang, Yijia Fan, Wenjun Lin et al.

NEURIPS 2025arXiv:2505.23399
50
citations
#104

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Yang Chen, Zhuolin Yang, Zihan Liu et al.

NEURIPS 2025arXiv:2505.16400
50
citations
#105

Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding

Ye Wang, Ziheng Wang, Boshen Xu et al.

NEURIPS 2025oralarXiv:2503.13377
49
citations
#106

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Yibin Wang, li zhimin, Yuhang Zang et al.

NEURIPS 2025arXiv:2505.03318
49
citations
#107

LLM Generated Persona is a Promise with a Catch

Leon Li, Haozhe Chen, Hongseok Namkoong et al.

NEURIPS 2025arXiv:2503.16527
49
citations
#108

NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions

Weizhe Yuan, Jane Yu, Song Jiang et al.

NEURIPS 2025arXiv:2502.13124
49
citations
#109

Atom of Thoughts for Markov LLM Test-Time Scaling

Fengwei Teng, Quan Shi, Zhaoyang Yu et al.

NEURIPS 2025arXiv:2502.12018
49
citations
#110

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

Zhen Zhang, Xuehai He, Weixiang Yan et al.

NEURIPS 2025arXiv:2505.15778
48
citations
#111

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

Dominic Maggio, Hyungtae Lim, Luca Carlone

NEURIPS 2025arXiv:2505.12549
48
citations
#112

The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control

Ruili Feng, Han Zhang, Zhilei Shu et al.

NEURIPS 2025arXiv:2412.03568
48
citations
#113

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Xiangyu Zhao, Peiyuan Zhang, Kexian Tang et al.

NEURIPS 2025oralarXiv:2504.02826
47
citations
#114

SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Peixian Ma, Xialie Zhuang, Chengjin Xu et al.

NEURIPS 2025arXiv:2504.08600
47
citations
#115

Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Zafir Stojanovski, Oliver Stanley, Joe Sharratt et al.

NEURIPS 2025spotlightarXiv:2505.24760
47
citations
#116

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Jang Hyun Cho, Andrea Madotto, Effrosyni Mavroudi et al.

NEURIPS 2025oralarXiv:2504.13180
47
citations
#117

Real-Time Execution of Action Chunking Flow Policies

Kevin Black, Manuel Galliker, Sergey Levine

NEURIPS 2025oralarXiv:2506.07339
46
citations
#118

What Can RL Bring to VLA Generalization? An Empirical Study

Jijia Liu, Feng Gao, Bingwen Wei et al.

NEURIPS 2025arXiv:2505.19789
46
citations
#119

WritingBench: A Comprehensive Benchmark for Generative Writing

Yuning Wu, Jiahao Mei, Ming Yan et al.

NEURIPS 2025arXiv:2503.05244
46
citations
#120

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Hao Gao, Shaoyu Chen, Bo Jiang et al.

NEURIPS 2025arXiv:2502.13144
45
citations
#121

Accelerating Diffusion LLMs via Adaptive Parallel Decoding

Daniel Israel, Guy Van den Broeck, Aditya Grover

NEURIPS 2025spotlightarXiv:2506.00413
44
citations
#122

Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding

Yiming Wang, Pei Zhang, Siyuan Huang et al.

NEURIPS 2025spotlightarXiv:2503.01422
44
citations
#123

Video World Models with Long-term Spatial Memory

Tong Wu, Shuai Yang, Ryan Po et al.

NEURIPS 2025oralarXiv:2506.05284
44
citations
#124

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Shanchuan Lin, Ceyuan Yang, Hao He et al.

NEURIPS 2025arXiv:2506.09350
44
citations
#125

Scaling RL to Long Videos

Yukang Chen, Wei Huang, Baifeng Shi et al.

NEURIPS 2025arXiv:2507.07966
44
citations
#126

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

Haizhong Zheng, Yang Zhou, Brian Bartoldson et al.

NEURIPS 2025oralarXiv:2506.02177
44
citations
#127

ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning

Jingyang Yi, Jiazheng Wang, Sida Li

NEURIPS 2025arXiv:2504.21370
43
citations
#128

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Qianhui Wu, Kanzhi Cheng, Rui Yang et al.

NEURIPS 2025arXiv:2506.03143
43
citations
#129

TabArena: A Living Benchmark for Machine Learning on Tabular Data

Nick Erickson, Lennart Purucker, Andrej Tschalzev et al.

NEURIPS 2025spotlightarXiv:2506.16791
43
citations
#130

Detecting Data Deviations in Electronic Health Records

Kaiping Zheng, Horng-Ruey Chua, Beng Chin Ooi

NEURIPS 2025
43
citations
#131

Faster Video Diffusion with Trainable Sparse Attention

Peiyuan Zhang, Yongqi Chen, Haofeng Huang et al.

NEURIPS 2025arXiv:2505.13389
42
citations
#132

Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations

Yuhao Yang, ZhI JI, Zhaopeng Li et al.

NEURIPS 2025arXiv:2503.02453
41
citations
#133

KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills

Weiji Xie, Jinrui Han, Jiakun Zheng et al.

NEURIPS 2025arXiv:2506.12851
41
citations
#134

A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules

Xiang Li, Feng Ruan, Huiyuan Wang et al.

NEURIPS 2025arXiv:2404.01245
41
citations
#135

Agentic RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving

Xinji Mai, Haotian Xu, Xing W et al.

NEURIPS 2025
40
citations
#136

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Jorge (Zhoujun) Cheng, Shibo Hao, Tianyang Liu et al.

NEURIPS 2025arXiv:2506.14965
40
citations
#137

ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

Ziyu Wan, Yunxiang Li, Xiaoyu Wen et al.

NEURIPS 2025arXiv:2503.09501
40
citations
#138

From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D

Jiahui Zhang, Yurui Chen, Yueming Xu et al.

NEURIPS 2025arXiv:2503.22976
40
citations
#139

Think Only When You Need with Large Hybrid-Reasoning Models

Lingjie Jiang, Xun Wu, Shaohan Huang et al.

NEURIPS 2025arXiv:2505.14631
40
citations
#140

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Shuo Yang, Haocheng Xi, Yilong Zhao et al.

NEURIPS 2025spotlightarXiv:2505.18875
40
citations
#141

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Siyuan Huang, Liliang Chen, Pengfei Zhou et al.

NEURIPS 2025arXiv:2501.01895
39
citations
#142

Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL

Songjun Tu, Jiahao Lin, Qichao Zhang et al.

NEURIPS 2025arXiv:2505.10832
39
citations
#143

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Zhilin Wang, Jiaqi Zeng, Olivier Delalleau et al.

NEURIPS 2025arXiv:2505.11475
38
citations
#144

UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens

Ruichuan An, Sihan Yang, Renrui Zhang et al.

NEURIPS 2025arXiv:2505.14671
38
citations
#145

Align Your Flow: Scaling Continuous-Time Flow Map Distillation

Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

NEURIPS 2025arXiv:2506.14603
38
citations
#146

Generalizing Verifiable Instruction Following

Valentina Pyatkin, Saumya Malik, Victoria Graf et al.

NEURIPS 2025arXiv:2507.02833
38
citations
#147

Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective

Sifan Wang, Ananyae bhartari, Bowen Li et al.

NEURIPS 2025arXiv:2502.00604
38
citations
#148

Informed Correctors for Discrete Diffusion Models

Yixiu Zhao, Jiaxin Shi, Feng Chen et al.

NEURIPS 2025arXiv:2407.21243
37
citations
#149

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

Shuang Wu, Youtian Lin, Feihu Zhang et al.

NEURIPS 2025arXiv:2505.17412
37
citations
#150

OpenCUA: Open Foundations for Computer-Use Agents

Xinyuan Wang, Bowen Wang, Dunjie Lu et al.

NEURIPS 2025spotlightarXiv:2508.09123
37
citations
#151

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

Yong Liu, Zirui Zhu, Chaoyu Gong et al.

NEURIPS 2025arXiv:2402.15751
37
citations
#152

WorldModelBench: Judging Video Generation Models As World Models

Dacheng Li, Yunhao Fang, Yukang Chen et al.

NEURIPS 2025arXiv:2502.20694
37
citations
#153

SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning

Rui Pan, Yinwei Dai, Zhihao Zhang et al.

NEURIPS 2025arXiv:2504.07891
37
citations
#154

Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

Zemin Huang, Zhiyang Chen, Zijun Wang et al.

NEURIPS 2025arXiv:2505.10446
37
citations
#155

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Zekun Qi, Wenyao Zhang, Yufei Ding et al.

NEURIPS 2025spotlightarXiv:2502.13143
36
citations
#156

TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning

Andreas Auer, Patrick Podest, Daniel Klotz et al.

NEURIPS 2025arXiv:2505.23719
36
citations
#157

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Mantas Mazeika, Xuwang Yin, Rishub Tamirisa et al.

NEURIPS 2025spotlightarXiv:2502.08640
36
citations
#158

Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning

Wenlin Zhang, Xiangyang Li, Kuicai Dong et al.

NEURIPS 2025arXiv:2505.14069
36
citations
#159

WISA: World simulator assistant for physics-aware text-to-video generation

Jing Wang, Ao Ma, Ke Cao et al.

NEURIPS 2025spotlightarXiv:2503.08153
35
citations
#160

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NEURIPS 2025arXiv:2505.19591
35
citations
#161

Harnessing the Universal Geometry of Embeddings

Rishi Jha, Collin Zhang, Vitaly Shmatikov et al.

NEURIPS 2025arXiv:2505.12540
35
citations
#162

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

Tianhe Wu, Jian Zou, Jie Liang et al.

NEURIPS 2025spotlightarXiv:2505.14460
35
citations
#163

Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling

Zhihao Li, Yufei Wang, Heliang Zheng et al.

NEURIPS 2025arXiv:2505.14521
35
citations
#164

PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis

Yan Wu, Esther Wershof, Sebastian Schmon et al.

NEURIPS 2025arXiv:2408.10609
34
citations
#165

More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Zhongxing Xu, Chengzhi Liu, Qingyue Wei et al.

NEURIPS 2025arXiv:2505.21523
34
citations
#166

On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning

Alvaro Arroyo, Alessio Gravina, Benjamin Gutteridge et al.

NEURIPS 2025arXiv:2502.10818
34
citations
#167

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Zhe Kong, Feng Gao, Yong Zhang et al.

NEURIPS 2025arXiv:2505.22647
34
citations
#168

EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models

Yantai Yang, Yuhao Wang, Zichen Wen et al.

NEURIPS 2025oralarXiv:2506.10100
34
citations
#169

Tensor Product Attention Is All You Need

Yifan Zhang, Yifeng Liu, Huizhuo Yuan et al.

NEURIPS 2025spotlightarXiv:2501.06425
34
citations
#170

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning

Qi Wang, Yanrui Yu, Ye Yuan et al.

NEURIPS 2025oralarXiv:2505.12434
34
citations
#171

Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

Siwei Wen, junyan ye, Peilin Feng et al.

NEURIPS 2025arXiv:2503.14905
34
citations
#172

Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

Xi Chen, Kaituo Feng, Changsheng Li et al.

NEURIPS 2025arXiv:2410.01623
34
citations
#173

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry

Sai Sumedh R. Hindupur, Ekdeep S Lubana, Thomas Fel et al.

NEURIPS 2025arXiv:2503.01822
34
citations
#174

The Leaderboard Illusion

Shivalika Singh, Yiyang Nan, Alex Wang et al.

NEURIPS 2025arXiv:2504.20879
34
citations
#175

GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents

Yuqi Zhou, Sunhao Dai, Shuai Wang et al.

NEURIPS 2025arXiv:2505.15810
34
citations
#176

Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Ge Wu, Shen Zhang, Ruijing Shi et al.

NEURIPS 2025oralarXiv:2507.01467
33
citations
#177

SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Ibragim Badertdinov, Alexander Golubev, Maksim Nekrashevich et al.

NEURIPS 2025arXiv:2505.20411
33
citations
#178

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

Will Merrill, Ashish Sabharwal

NEURIPS 2025arXiv:2503.03961
33
citations
#179

MAT-Agent: Adaptive Multi-Agent Training Optimization

jusheng zhang, Kaitong Cai, Yijia Fan et al.

NEURIPS 2025arXiv:2510.17845
33
citations
#180

Reasoning Models Better Express Their Confidence

Dongkeun Yoon, Seungone Kim, Sohee Yang et al.

NEURIPS 2025arXiv:2505.14489
33
citations
#181

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Sangmin Bae, Yujin Kim, Reza Bayat et al.

NEURIPS 2025arXiv:2507.10524
33
citations
#182

Don't be lazy: CompleteP enables compute-efficient deep transformers

Nolan Dey, Bin Zhang, Lorenzo Noci et al.

NEURIPS 2025arXiv:2505.01618
33
citations
#183

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

Zhongwei Wan, Zhihao Dou, Che Liu et al.

NEURIPS 2025arXiv:2506.01713
32
citations
#184

OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization

Yiyou Sun, Shawn Hu, Georgia Zhou et al.

NEURIPS 2025arXiv:2506.18880
32
citations
#185

Checklists Are Better Than Reward Models For Aligning Language Models

Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.

NEURIPS 2025spotlightarXiv:2507.18624
32
citations
#186

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

Guo Chen, Zhiqi Li, Shihao Wang et al.

NEURIPS 2025arXiv:2504.15271
32
citations
#187

Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

Hanlin Zhu, Shibo Hao, Zhiting Hu et al.

NEURIPS 2025arXiv:2505.12514
32
citations
#188

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Weifeng Lin, Xinyu Wei, Ruichuan An et al.

NEURIPS 2025arXiv:2506.05302
32
citations
#189

Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms

Yinuo Ren, Haoxuan Chen, Yuchen Zhu et al.

NEURIPS 2025arXiv:2502.00234
32
citations
#190

Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains

Wenhui Tan, Jiaze Li, Jianzhong Ju et al.

NEURIPS 2025arXiv:2505.16552
32
citations
#191

Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos

hanxue liang, Jiawei Ren, Ashkan Mirzaei et al.

NEURIPS 2025arXiv:2412.03526
31
citations
#192

AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions

Polina Kirichenko, Mark Ibrahim, Kamalika Chaudhuri et al.

NEURIPS 2025arXiv:2506.09038
31
citations
#193

How to build a consistency model: Learning flow maps via self-distillation

Nicholas Boffi, Michael Albergo, Eric Vanden-Eijnden

NEURIPS 2025arXiv:2505.18825
31
citations
#194

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Jintao Zhang, Jia wei, Haoxu Wang et al.

NEURIPS 2025spotlightarXiv:2505.11594
31
citations
#195

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

Guibin Zhang, Muxin Fu, Kun Wang et al.

NEURIPS 2025spotlightarXiv:2506.07398
31
citations
#196

Best-of-N Jailbreaking

John Hughes, Sara Price, Aengus Lynch et al.

NEURIPS 2025arXiv:2412.03556
31
citations
#197

VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning

Qiuchen Wang, Ruixue Ding, Yu Zeng et al.

NEURIPS 2025arXiv:2505.22019
31
citations
#198

ASGO: Adaptive Structured Gradient Optimization

Kang An, Yuxing Liu, Rui Pan et al.

NEURIPS 2025arXiv:2503.20762
31
citations
#199

Policy learning “without” overlap: Pessimism and generalized empirical Bernstein’s inequality

Ying Jin, Zhimei Ren, Zhuoran Yang et al.

NEURIPS 2025arXiv:2212.09900
31
citations
#200

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

Yuchen Lin, Chenguo Lin, Panwang Pan et al.

NEURIPS 2025arXiv:2506.05573
31
citations
PreviousNext