Most Cited NEURIPS "zero-shot policy transfer" Papers

5,858 papers found • Page 1 of 30

Filters:Most Cited NEURIPS zero-shot policy transfer Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Chaoyou Fu, Peixian Chen, Yunhang Shen et al.

NEURIPS 2025spotlightarXiv:2306.13394

1277

citations

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Qiying Yu, Zheng Zhang, Ruofei Zhu et al.

NEURIPS 2025arXiv:2503.14476

1211

citations

YOLOv12: Attention-Centric Real-Time Object Detectors

Yunjie Tian, Qixiang Ye, DAVID DOERMANN

NEURIPS 2025arXiv:2502.12524

938

citations

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Yang Yue, Zhiqi Chen, Rui Lu et al.

NEURIPS 2025oralarXiv:2504.13837

540

citations

Gymnasium: A Standard Interface for Reinforcement Learning Environments

Mark Towers, Ariel Kwiatkowski, John Balis et al.

NEURIPS 2025spotlightarXiv:2407.17032

534

citations

Large Language Diffusion Models

Shen Nie, Fengqi Zhu, Zebin You et al.

NEURIPS 2025oralarXiv:2502.09992

403

citations

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Jingcheng Hu, Yinmin Zhang, Qi Han et al.

NEURIPS 2025arXiv:2503.24290

347

citations

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Shenzhi Wang, Le Yu, Chang Gao et al.

NEURIPS 2025arXiv:2506.01939

305

citations

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh vahid et al.

NEURIPS 2025arXiv:2506.06941

277

citations

#10

Video-R1: Reinforcing Video Reasoning in MLLMs

Kaituo Feng, Kaixiong Gong, Bohao Li et al.

NEURIPS 2025oralarXiv:2503.21776

257

citations

#11

A-Mem: Agentic Memory for LLM Agents

Wujiang Xu, Zujie Liang, Kai Mei et al.

NEURIPS 2025arXiv:2502.12110

250

citations

#12

Flow-GRPO: Training Flow Matching Models via Online RL

Jie Liu, Gongye Liu, Jiajun Liang et al.

NEURIPS 2025arXiv:2505.05470

221

citations

#13

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z Pan, Shuyi Yang et al.

NEURIPS 2025spotlightarXiv:2503.13657

204

citations

#14

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Xiaoxi Li, Jiajie Jin, Guanting Dong et al.

NEURIPS 2025arXiv:2504.21776

198

citations

#15

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Yiping Wang, Qing Yang, Zhiyuan Zeng et al.

NEURIPS 2025arXiv:2504.20571

190

citations

#16

Mean Flows for One-step Generative Modeling

Zhengyang Geng, Mingyang Deng, Xingjian Bai et al.

NEURIPS 2025oralarXiv:2505.13447

185

citations

#17

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Haozhe Wang, Chao Qu, Zuming Huang et al.

NEURIPS 2025spotlightarXiv:2504.08837

183

citations

#18

ToolRL: Reward is All Tool Learning Needs

Cheng Qian, Emre Can Acikgoz, Qi He et al.

NEURIPS 2025oralarXiv:2504.13958

178

citations

#19

Training Language Models to Reason Efficiently

Daman Arora, Andrea Zanette

NEURIPS 2025arXiv:2502.04463

178

citations

#20

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Jonas Geiping, Sean McLeish, Neel Jain et al.

NEURIPS 2025spotlightarXiv:2502.05171

158

citations

#21

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Yuxiang Wei, Olivier Duchenne, Jade Copet et al.

NEURIPS 2025arXiv:2502.18449

156

citations

#22

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Andrew Zhao, Yiran Wu, Yang Yue et al.

NEURIPS 2025spotlightarXiv:2505.03335

147

citations

#23

Titans: Learning to Memorize at Test Time

Ali Behrouz, Peilin Zhong, Vahab Mirrokni

NEURIPS 2025arXiv:2501.00663

147

citations

#24

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Xun Huang, Zhengqi Li, Guande He et al.

NEURIPS 2025spotlightarXiv:2506.08009

145

citations

#25

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Chaoyou Fu, Haojia Lin, Xiong Wang et al.

NEURIPS 2025spotlightarXiv:2501.01957

138

citations

#26

MMaDA: Multimodal Large Diffusion Language Models

Ling Yang, Ye Tian, Bowen Li et al.

NEURIPS 2025arXiv:2505.15809

135

citations

#27

Perception Encoder: The best visual embeddings are not at the output of the network

Daniel Bolya, Po-Yao Huang, Peize Sun et al.

NEURIPS 2025oralarXiv:2504.13181

129

citations

#28

Learning to Reason under Off-Policy Guidance

Jianhao Yan, Yafu Li, Zican Hu et al.

NEURIPS 2025arXiv:2504.14945

129

citations

#29

TTRL: Test-Time Reinforcement Learning

Yuxin Zuo, Kaiyan Zhang, Li Sheng et al.

NEURIPS 2025arXiv:2504.16084

129

citations

#30

Improving Video Generation with Human Feedback

Jie Liu, Gongye Liu, Jiajun Liang et al.

NEURIPS 2025arXiv:2501.13918

127

citations

#31

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Xeron Du, Yifan Yao, Kaijing Ma et al.

NEURIPS 2025arXiv:2502.14739

118

citations

#32

AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Wei Fu, Jiaxuan Gao, Xujie Shen et al.

NEURIPS 2025arXiv:2505.24298

117

citations

#33

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Yuhui Li, Fangyun Wei, Chao Zhang et al.

NEURIPS 2025arXiv:2503.01840

115

citations

#34

Rethinking Joint Maximum Mean Discrepancy for Visual Domain Adaptation

Wei Wang, Haifeng Xia, Chao Huang et al.

MoBA: Mixture of Block Attention for Long-Context LLMs

Enzhe Lu, Zhejun Jiang, Jingyuan Liu et al.

NEURIPS 2025spotlightarXiv:2502.13189

109

citations

#36

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Huanjin Yao, Jiaxing Huang, Wenhao Wu et al.

NEURIPS 2025spotlightarXiv:2412.18319

106

citations

#37

Show-o2: Improved Native Unified Multimodal Models

Jinheng Xie, Zhenheng Yang, Mike Zheng Shou

NEURIPS 2025oralarXiv:2506.15564

106

citations

#38

Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference

Yuan Feng, Junlin Lv, Yukun Cao et al.

NEURIPS 2025arXiv:2407.11550

106

citations

#39

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Frank (Fangzheng) Xu, Yufan Song, Boxuan Li et al.

NEURIPS 2025arXiv:2412.14161

105

citations

#40

Group-in-Group Policy Optimization for LLM Agent Training

Lang Feng, Zhenghai Xue, Tingcong Liu et al.

NEURIPS 2025arXiv:2505.10978

105

citations

#41

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Mingjie Liu, Shizhe Diao, Ximing Lu et al.

NEURIPS 2025arXiv:2505.24864

104

citations

#42

Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning

Wenkai Yang, Shuming Ma, Yankai Lin et al.

NEURIPS 2025arXiv:2502.18080

103

citations

#43

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

Shivam Agarwal, Zimin Zhang, Lifan Yuan et al.

NEURIPS 2025arXiv:2505.15134

102

citations

#44

Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

Zechuan Zhang, Ji Xie, Yu Lu et al.

NEURIPS 2025arXiv:2504.20690

100

citations

#45

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Dongzhi JIANG, Ziyu Guo, Renrui Zhang et al.

NEURIPS 2025arXiv:2505.00703

100

citations

#46

ImgEdit: A Unified Image Editing Dataset and Benchmark

Yang Ye, Xianyi He, Zongjian Li et al.

NEURIPS 2025arXiv:2505.20275

citations

#47

WebDancer: Towards Autonomous Information Seeking Agency

Jialong Wu, Baixuan Li, Runnan Fang et al.

NEURIPS 2025arXiv:2505.22648

citations

#48

Remarkable Robustness of LLMs: Stages of Inference?

Vedang Lad, Jin Hwa Lee, Wes Gurnee et al.

NEURIPS 2025oralarXiv:2406.19384

citations

#49

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models

Sreyan Ghosh, Arushi Goel, Jaehyeon Kim et al.

NEURIPS 2025spotlightarXiv:2507.08128

citations

#50

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Di Liu, Meng Chen, Baotong Lu et al.

NEURIPS 2025arXiv:2409.10516

citations

#51

Remasking Discrete Diffusion Models with Inference-Time Scaling

Guanghan Wang, Yair Schiff, Subham Sahoo et al.

NEURIPS 2025arXiv:2503.00307

citations

#52

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-based Decoding

Xiner Li, Yulai Zhao, Chenyu Wang et al.

NEURIPS 2025arXiv:2408.08252

citations

#53

SWE-smith: Scaling Data for Software Engineering Agents

John Yang, Kilian Lieret, Carlos Jimenez et al.

NEURIPS 2025spotlightarXiv:2504.21798

citations

#54

Scalable Best-of-N Selection for Large Language Models via Self-Certainty

Zhewei Kang, Xuandong Zhao, Dawn Song

NEURIPS 2025arXiv:2502.18581

citations

#55

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.

NEURIPS 2025arXiv:2506.01347

citations

#56

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Siyan Zhao, Devaansh Gupta, Qinqing Zheng et al.

NEURIPS 2025spotlightarXiv:2504.12216

citations

#57

OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

Mengkang Hu, Yuhang Zhou, Wendong Fan et al.

NEURIPS 2025arXiv:2505.23885

citations

#58

General-Reasoner: Advancing LLM Reasoning Across All Domains

Xueguang Ma, Qian Liu, Dongfu Jiang et al.

NEURIPS 2025arXiv:2505.14652

citations

#59

LMFusion: Adapting Pretrained Language Models for Multimodal Generation

Weijia Shi, Xiaochuang Han, Chunting Zhou et al.

NEURIPS 2025arXiv:2412.15188

citations

#60

FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving

Shuang Zeng, Xinyuan Chang, Mengwei Xie et al.

NEURIPS 2025oralarXiv:2505.17685

citations

#61

APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

Akshara Prabhakar, Zuxin Liu, Ming Zhu et al.

NEURIPS 2025arXiv:2504.03601

citations

#62

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

Xiyao Wang, Zhengyuan Yang, Chao Feng et al.

NEURIPS 2025spotlightarXiv:2504.07934

citations

#63

A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

David Chanin, James Wilken-Smith, Tomáš Dulka et al.

NEURIPS 2025oralarXiv:2409.14507

citations

#64

Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

Chongyu Fan, Jiancheng Liu, Licong Lin et al.

NEURIPS 2025arXiv:2410.07163

citations

#65

UniTok: a Unified Tokenizer for Visual Generation and Understanding

Chuofan Ma, Yi Jiang, Junfeng Wu et al.

NEURIPS 2025spotlightarXiv:2502.20321

citations

#66

dKV-Cache: The Cache for Diffusion Language Models

Xinyin Ma, Runpeng Yu, Gongfan Fang et al.

NEURIPS 2025arXiv:2505.15781

citations

#67

Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization

Qingyang Zhang, Haitao Wu, Changqing Zhang et al.

NEURIPS 2025spotlightarXiv:2504.05812

citations

#68

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Zihan Qiu, Zekun Wang, Bo Zheng et al.

NEURIPS 2025oralarXiv:2505.06708

citations

#69

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Diankun Wu, Fangfu Liu, Yi-Hsin Hung et al.

NEURIPS 2025spotlightarXiv:2505.23747

citations

#70

What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions

Sang Choe, Hwijeen Ahn, Juhan Bae et al.

NEURIPS 2025arXiv:2405.13954

citations

#71

AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

Zewei Zhou, Tianhui Cai, Seth Zhao et al.

NEURIPS 2025arXiv:2506.13757

citations

#72

UMA: A Family of Universal Models for Atoms

Brandon Wood, Misko Dzamba, Xiang Fu et al.

NEURIPS 2025spotlightarXiv:2506.23971

citations

#73

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension

Yongdong Luo, Xiawu Zheng, Guilin Li et al.

NEURIPS 2025arXiv:2411.13093

citations

#74

Offline Actor-Critic for Average Reward MDPs

William Powell, Jeongyeol Kwon, Qiaomin Xie et al.

NEURIPS 2025

citations

#75

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Ling Fu, Zhebin Kuang, Jiajun Song et al.

NEURIPS 2025arXiv:2501.00321

citations

#76

MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details

Ruicheng Wang, Sicheng Xu, Yue Dong et al.

NEURIPS 2025arXiv:2507.02546

citations

#77

LoRA vs Full Fine-tuning: An Illusion of Equivalence

Reece Shuttleworth, Jacob Andreas, Antonio Torralba et al.

NEURIPS 2025arXiv:2410.21228

citations

#78

Thinkless: LLM Learns When to Think

Gongfan Fang, Xinyin Ma, Xinchao Wang

NEURIPS 2025arXiv:2505.13379

citations

#79

Q-Insight: Understanding Image Quality via Visual Reinforcement Learning

Weiqi Li, Xuanyu Zhang, Shijie Zhao et al.

NEURIPS 2025spotlightarXiv:2503.22679

citations

#80

CSGO: Content-Style Composition in Text-to-Image Generation

Peng Xing, Haofan Wang, Yanpeng Sun et al.

NEURIPS 2025arXiv:2408.16766

citations

#81

Perception-R1: Pioneering Perception Policy with Reinforcement Learning

En Yu, Kangheng Lin, Liang Zhao et al.

NEURIPS 2025arXiv:2504.07954

citations

#82

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Wenyao Zhang, Hongsi Liu, Zekun Qi et al.

NEURIPS 2025arXiv:2507.04447

citations

#83

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

Daoguang Zan, Zhirong Huang, Wei Liu et al.

NEURIPS 2025arXiv:2504.02605

citations

#84

GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing

Rongyao Fang, Chengqi Duan, Kun Wang et al.

NEURIPS 2025

citations

#85

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Tianbao Xie, Jiaqi Deng, Xiaochuan Li et al.

NEURIPS 2025spotlightarXiv:2505.13227

citations

#86

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Zhaorun Chen, Zichen Wen, Yichao Du et al.

NEURIPS 2025arXiv:2407.04842

citations

#87

GRIT: Teaching MLLMs to Think with Images

Yue Fan, Xuehai He, Diji Yang et al.

NEURIPS 2025arXiv:2505.15879

citations

#88

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Chi-Pin Huang, Yueh-Hua Wu, Min-Hung Chen et al.

NEURIPS 2025arXiv:2507.16815

citations

#89

Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models

Lvmin Zhang, Shengqu Cai, Muyang Li et al.

NEURIPS 2025spotlightarXiv:2504.12626

citations

#90

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi et al.

NEURIPS 2025arXiv:2506.04308

citations

#91

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Ziyang Ma, Yinghao Ma, Yanqiao Zhu et al.

NEURIPS 2025arXiv:2505.13032

citations

#92

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Mingyang Chen, Linzhuang Sun, Tianpeng Li et al.

NEURIPS 2025arXiv:2503.19470

citations

#93

NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

Xiangyan Liu, Jinjie Ni, Zijian Wu et al.

NEURIPS 2025arXiv:2504.13055

citations

#94

CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

Zhihang Lin, Mingbao Lin, Yuan Xie et al.

NEURIPS 2025arXiv:2503.22342

citations

#95

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing

Junfei Wu, Jian Guan, Kaituo Feng et al.

NEURIPS 2025arXiv:2506.09965

citations

#96

WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks

Ivan Evtimov, Arman Zharmagambetov, Aaron Grattafiori et al.

NEURIPS 2025arXiv:2504.18575

citations

#97

Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking

Heli Ben-Hamu, Itai Gat, Daniel Severo et al.

NEURIPS 2025arXiv:2505.24857

citations

#98

Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better

Danny Driess, Jost Springenberg, Brian Ichter et al.

NEURIPS 2025spotlightarXiv:2505.23705

citations

#99

OmniBench: Towards The Future of Universal Omni-Language Models

Yizhi Li, Ge Zhang, Yinghao Ma et al.

NEURIPS 2025arXiv:2409.15272

citations

#100

WorldMem: Long-term Consistent World Simulation with Memory

Zeqi Xiao, Yushi LAN, Yifan Zhou et al.

NEURIPS 2025oralarXiv:2504.12369

citations

#101

What Makes a Reward Model a Good Teacher? An Optimization Perspective

Noam Razin, Zixuan Wang, Hubert Strauss et al.

NEURIPS 2025spotlightarXiv:2503.15477

citations

#102

S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models

Muzhi Dai, Chenxu Yang, Qingyi Si

NEURIPS 2025oralarXiv:2505.07686

citations

#103

GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning

jusheng zhang, Yijia Fan, Wenjun Lin et al.

NEURIPS 2025arXiv:2505.23399

citations

#104

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Yang Chen, Zhuolin Yang, Zihan Liu et al.

NEURIPS 2025arXiv:2505.16400

citations

#105

Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding

Ye Wang, Ziheng Wang, Boshen Xu et al.

NEURIPS 2025oralarXiv:2503.13377

citations

#106

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Yibin Wang, li zhimin, Yuhang Zang et al.

NEURIPS 2025arXiv:2505.03318

citations

#107

LLM Generated Persona is a Promise with a Catch

Leon Li, Haozhe Chen, Hongseok Namkoong et al.

NEURIPS 2025arXiv:2503.16527

citations

#108

NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions

Weizhe Yuan, Jane Yu, Song Jiang et al.

NEURIPS 2025arXiv:2502.13124

citations

#109

Atom of Thoughts for Markov LLM Test-Time Scaling

Fengwei Teng, Quan Shi, Zhaoyang Yu et al.

NEURIPS 2025arXiv:2502.12018

citations

#110

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

Zhen Zhang, Xuehai He, Weixiang Yan et al.

NEURIPS 2025arXiv:2505.15778

citations

#111

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

Dominic Maggio, Hyungtae Lim, Luca Carlone

NEURIPS 2025arXiv:2505.12549

citations

#112

The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control

Ruili Feng, Han Zhang, Zhilei Shu et al.

NEURIPS 2025arXiv:2412.03568

citations

#113

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Xiangyu Zhao, Peiyuan Zhang, Kexian Tang et al.

NEURIPS 2025oralarXiv:2504.02826

citations

#114

SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Peixian Ma, Xialie Zhuang, Chengjin Xu et al.

NEURIPS 2025arXiv:2504.08600

citations

#115

Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Zafir Stojanovski, Oliver Stanley, Joe Sharratt et al.

NEURIPS 2025spotlightarXiv:2505.24760

citations

#116

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Jang Hyun Cho, Andrea Madotto, Effrosyni Mavroudi et al.

NEURIPS 2025oralarXiv:2504.13180

citations

#117

Real-Time Execution of Action Chunking Flow Policies

Kevin Black, Manuel Galliker, Sergey Levine

NEURIPS 2025oralarXiv:2506.07339

citations

#118

What Can RL Bring to VLA Generalization? An Empirical Study

Jijia Liu, Feng Gao, Bingwen Wei et al.

NEURIPS 2025arXiv:2505.19789

citations

#119

WritingBench: A Comprehensive Benchmark for Generative Writing

Yuning Wu, Jiahao Mei, Ming Yan et al.

NEURIPS 2025arXiv:2503.05244

citations

#120

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Hao Gao, Shaoyu Chen, Bo Jiang et al.

NEURIPS 2025arXiv:2502.13144

citations

#121

Accelerating Diffusion LLMs via Adaptive Parallel Decoding

Daniel Israel, Guy Van den Broeck, Aditya Grover

NEURIPS 2025spotlightarXiv:2506.00413

citations

#122

Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding

Yiming Wang, Pei Zhang, Siyuan Huang et al.

NEURIPS 2025spotlightarXiv:2503.01422

citations

#123

Video World Models with Long-term Spatial Memory

Tong Wu, Shuai Yang, Ryan Po et al.

NEURIPS 2025oralarXiv:2506.05284

citations

#124

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Shanchuan Lin, Ceyuan Yang, Hao He et al.

NEURIPS 2025arXiv:2506.09350

citations

#125

Scaling RL to Long Videos

Yukang Chen, Wei Huang, Baifeng Shi et al.

NEURIPS 2025arXiv:2507.07966

citations

#126

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

Haizhong Zheng, Yang Zhou, Brian Bartoldson et al.

NEURIPS 2025oralarXiv:2506.02177

citations

#127

ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning

Jingyang Yi, Jiazheng Wang, Sida Li

NEURIPS 2025arXiv:2504.21370

citations

#128

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Qianhui Wu, Kanzhi Cheng, Rui Yang et al.

NEURIPS 2025arXiv:2506.03143

citations

#129

TabArena: A Living Benchmark for Machine Learning on Tabular Data

Nick Erickson, Lennart Purucker, Andrej Tschalzev et al.

NEURIPS 2025spotlightarXiv:2506.16791

citations

#130

Detecting Data Deviations in Electronic Health Records

Kaiping Zheng, Horng-Ruey Chua, Beng Chin Ooi

NEURIPS 2025

citations

#131

Faster Video Diffusion with Trainable Sparse Attention

Peiyuan Zhang, Yongqi Chen, Haofeng Huang et al.

NEURIPS 2025arXiv:2505.13389

citations

#132

Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations

Yuhao Yang, ZhI JI, Zhaopeng Li et al.

NEURIPS 2025arXiv:2503.02453

citations

#133

KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills

Weiji Xie, Jinrui Han, Jiakun Zheng et al.

NEURIPS 2025arXiv:2506.12851

citations

#134

A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules

Xiang Li, Feng Ruan, Huiyuan Wang et al.

NEURIPS 2025arXiv:2404.01245

citations

#135

Agentic RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving

Xinji Mai, Haotian Xu, Xing W et al.

NEURIPS 2025

citations

#136

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Jorge (Zhoujun) Cheng, Shibo Hao, Tianyang Liu et al.

NEURIPS 2025arXiv:2506.14965

citations

#137

ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

Ziyu Wan, Yunxiang Li, Xiaoyu Wen et al.

NEURIPS 2025arXiv:2503.09501

citations

#138

From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D

Jiahui Zhang, Yurui Chen, Yueming Xu et al.

NEURIPS 2025arXiv:2503.22976

citations

#139

Think Only When You Need with Large Hybrid-Reasoning Models

Lingjie Jiang, Xun Wu, Shaohan Huang et al.

NEURIPS 2025arXiv:2505.14631

citations

#140

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Shuo Yang, Haocheng Xi, Yilong Zhao et al.

NEURIPS 2025spotlightarXiv:2505.18875

citations

#141

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Siyuan Huang, Liliang Chen, Pengfei Zhou et al.

NEURIPS 2025arXiv:2501.01895

citations

#142

Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL

Songjun Tu, Jiahao Lin, Qichao Zhang et al.

NEURIPS 2025arXiv:2505.10832

citations

#143

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Zhilin Wang, Jiaqi Zeng, Olivier Delalleau et al.

NEURIPS 2025arXiv:2505.11475

citations

#144

UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens

Ruichuan An, Sihan Yang, Renrui Zhang et al.

NEURIPS 2025arXiv:2505.14671

citations

#145

Align Your Flow: Scaling Continuous-Time Flow Map Distillation

Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

NEURIPS 2025arXiv:2506.14603

citations

#146

Generalizing Verifiable Instruction Following

Valentina Pyatkin, Saumya Malik, Victoria Graf et al.

NEURIPS 2025arXiv:2507.02833

citations

#147

Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective

Sifan Wang, Ananyae bhartari, Bowen Li et al.

NEURIPS 2025arXiv:2502.00604

citations

#148

Informed Correctors for Discrete Diffusion Models

Yixiu Zhao, Jiaxin Shi, Feng Chen et al.

NEURIPS 2025arXiv:2407.21243

citations

#149

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

Shuang Wu, Youtian Lin, Feihu Zhang et al.

NEURIPS 2025arXiv:2505.17412

citations

#150

OpenCUA: Open Foundations for Computer-Use Agents

Xinyuan Wang, Bowen Wang, Dunjie Lu et al.

NEURIPS 2025spotlightarXiv:2508.09123

citations

#151

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

Yong Liu, Zirui Zhu, Chaoyu Gong et al.

NEURIPS 2025arXiv:2402.15751

citations

#152

WorldModelBench: Judging Video Generation Models As World Models

Dacheng Li, Yunhao Fang, Yukang Chen et al.

NEURIPS 2025arXiv:2502.20694

citations

#153

SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning

Rui Pan, Yinwei Dai, Zhihao Zhang et al.

NEURIPS 2025arXiv:2504.07891

citations

#154

Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

Zemin Huang, Zhiyang Chen, Zijun Wang et al.

NEURIPS 2025arXiv:2505.10446

citations

#155

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Zekun Qi, Wenyao Zhang, Yufei Ding et al.

NEURIPS 2025spotlightarXiv:2502.13143

citations

#156

TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning

Andreas Auer, Patrick Podest, Daniel Klotz et al.

NEURIPS 2025arXiv:2505.23719

citations

#157

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Mantas Mazeika, Xuwang Yin, Rishub Tamirisa et al.

NEURIPS 2025spotlightarXiv:2502.08640

citations

#158

Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning

Wenlin Zhang, Xiangyang Li, Kuicai Dong et al.

NEURIPS 2025arXiv:2505.14069

citations

#159

WISA: World simulator assistant for physics-aware text-to-video generation

Jing Wang, Ao Ma, Ke Cao et al.

NEURIPS 2025spotlightarXiv:2503.08153

citations

#160

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NEURIPS 2025arXiv:2505.19591

citations

#161

Harnessing the Universal Geometry of Embeddings

Rishi Jha, Collin Zhang, Vitaly Shmatikov et al.

NEURIPS 2025arXiv:2505.12540

citations

#162

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

Tianhe Wu, Jian Zou, Jie Liang et al.

NEURIPS 2025spotlightarXiv:2505.14460

citations

#163

Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling

Zhihao Li, Yufei Wang, Heliang Zheng et al.

NEURIPS 2025arXiv:2505.14521

citations

#164

PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis

Yan Wu, Esther Wershof, Sebastian Schmon et al.

NEURIPS 2025arXiv:2408.10609

citations

#165

More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Zhongxing Xu, Chengzhi Liu, Qingyue Wei et al.

NEURIPS 2025arXiv:2505.21523

citations

#166

On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning

Alvaro Arroyo, Alessio Gravina, Benjamin Gutteridge et al.

NEURIPS 2025arXiv:2502.10818

citations

#167

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Zhe Kong, Feng Gao, Yong Zhang et al.

NEURIPS 2025arXiv:2505.22647

citations

#168

EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models

Yantai Yang, Yuhao Wang, Zichen Wen et al.

NEURIPS 2025oralarXiv:2506.10100

citations

#169

Tensor Product Attention Is All You Need

Yifan Zhang, Yifeng Liu, Huizhuo Yuan et al.

NEURIPS 2025spotlightarXiv:2501.06425

citations

#170

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning

Qi Wang, Yanrui Yu, Ye Yuan et al.

NEURIPS 2025oralarXiv:2505.12434

citations

#171

Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

Siwei Wen, junyan ye, Peilin Feng et al.

NEURIPS 2025arXiv:2503.14905

citations

#172

Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

Xi Chen, Kaituo Feng, Changsheng Li et al.

NEURIPS 2025arXiv:2410.01623

citations

#173

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry

Sai Sumedh R. Hindupur, Ekdeep S Lubana, Thomas Fel et al.

NEURIPS 2025arXiv:2503.01822

citations

#174

The Leaderboard Illusion

Shivalika Singh, Yiyang Nan, Alex Wang et al.

NEURIPS 2025arXiv:2504.20879

citations

#175

GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents

Yuqi Zhou, Sunhao Dai, Shuai Wang et al.

NEURIPS 2025arXiv:2505.15810

citations

#176

Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Ge Wu, Shen Zhang, Ruijing Shi et al.

NEURIPS 2025oralarXiv:2507.01467

citations

#177

SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Ibragim Badertdinov, Alexander Golubev, Maksim Nekrashevich et al.

NEURIPS 2025arXiv:2505.20411

citations

#178

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

Will Merrill, Ashish Sabharwal

NEURIPS 2025arXiv:2503.03961

citations

#179

MAT-Agent: Adaptive Multi-Agent Training Optimization

jusheng zhang, Kaitong Cai, Yijia Fan et al.

NEURIPS 2025arXiv:2510.17845

citations

#180

Reasoning Models Better Express Their Confidence

Dongkeun Yoon, Seungone Kim, Sohee Yang et al.

NEURIPS 2025arXiv:2505.14489

citations

#181

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Sangmin Bae, Yujin Kim, Reza Bayat et al.

NEURIPS 2025arXiv:2507.10524

citations

#182

Don't be lazy: CompleteP enables compute-efficient deep transformers

Nolan Dey, Bin Zhang, Lorenzo Noci et al.

NEURIPS 2025arXiv:2505.01618

citations

#183

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

Zhongwei Wan, Zhihao Dou, Che Liu et al.

NEURIPS 2025arXiv:2506.01713

citations

#184

OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization

Yiyou Sun, Shawn Hu, Georgia Zhou et al.

NEURIPS 2025arXiv:2506.18880

citations

#185

Checklists Are Better Than Reward Models For Aligning Language Models

Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.

NEURIPS 2025spotlightarXiv:2507.18624

citations

#186

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

Guo Chen, Zhiqi Li, Shihao Wang et al.

NEURIPS 2025arXiv:2504.15271

citations

#187

Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

Hanlin Zhu, Shibo Hao, Zhiting Hu et al.

NEURIPS 2025arXiv:2505.12514

citations

#188

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Weifeng Lin, Xinyu Wei, Ruichuan An et al.

NEURIPS 2025arXiv:2506.05302

citations

#189

Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms

Yinuo Ren, Haoxuan Chen, Yuchen Zhu et al.

NEURIPS 2025arXiv:2502.00234

citations

#190

Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains

Wenhui Tan, Jiaze Li, Jianzhong Ju et al.

NEURIPS 2025arXiv:2505.16552

citations

#191

Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos

hanxue liang, Jiawei Ren, Ashkan Mirzaei et al.

NEURIPS 2025arXiv:2412.03526

citations

#192

AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions

Polina Kirichenko, Mark Ibrahim, Kamalika Chaudhuri et al.

NEURIPS 2025arXiv:2506.09038

citations

#193

How to build a consistency model: Learning flow maps via self-distillation

Nicholas Boffi, Michael Albergo, Eric Vanden-Eijnden

NEURIPS 2025arXiv:2505.18825

citations

#194

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Jintao Zhang, Jia wei, Haoxu Wang et al.

NEURIPS 2025spotlightarXiv:2505.11594

citations

#195

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

Guibin Zhang, Muxin Fu, Kun Wang et al.

NEURIPS 2025spotlightarXiv:2506.07398

citations

#196

Best-of-N Jailbreaking

John Hughes, Sara Price, Aengus Lynch et al.

NEURIPS 2025arXiv:2412.03556

citations

#197

VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning

Qiuchen Wang, Ruixue Ding, Yu Zeng et al.

NEURIPS 2025arXiv:2505.22019

citations

#198

ASGO: Adaptive Structured Gradient Optimization

Kang An, Yuxing Liu, Rui Pan et al.

NEURIPS 2025arXiv:2503.20762

citations

#199

Policy learning “without” overlap: Pessimism and generalized empirical Bernstein’s inequality

Ying Jin, Zhimei Ren, Zhuoran Yang et al.

NEURIPS 2025arXiv:2212.09900

citations

#200

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

Yuchen Lin, Chenguo Lin, Panwang Pan et al.

NEURIPS 2025arXiv:2506.05573

citations

← Previous

1 2 3...30