NeurIPS 2025 "reinforcement learning" Papers

22 papers found

ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition

Daolang Huang, Xinyi Wen, Ayush Bharti et al.

NeurIPS 2025spotlightarXiv:2506.07259
2
citations

AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws

Oren Neumann, Claudius Gros

NeurIPS 2025spotlightarXiv:2412.11979
9
citations

Continual Knowledge Adaptation for Reinforcement Learning

Jinwu Hu, ZiHao Lian, Zhiquan Wen et al.

NeurIPS 2025posterarXiv:2510.19314
1
citations

GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining

Chunyu Wei, Wenji Hu, Xingjia Hao et al.

NeurIPS 2025posterarXiv:2511.00457

Meta-learning how to Share Credit among Macro-Actions

Ionel-Alexandru Hosu, Traian Rebedea, Razvan Pascanu

NeurIPS 2025oralarXiv:2506.13690

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Xi Chen, Mingkang Zhu, Shaoteng Liu et al.

NeurIPS 2025posterarXiv:2506.22434

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NeurIPS 2025posterarXiv:2505.19591
25
citations

On the Convergence of Projected Policy Gradient for Any Constant Step Sizes

Jiacai Liu, Wenye Li, Dachao Lin et al.

NeurIPS 2025posterarXiv:2311.01104
4
citations

OrbitZoo: Real Orbital Systems Challenges for Reinforcement Learning

Alexandre Oliveira, Katarina Dyreby, Francisco Caldas et al.

NeurIPS 2025posterarXiv:2504.04160

Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing

Yilmazcan Ozyurt, Tunaberk Almaci, Stefan Feuerriegel et al.

NeurIPS 2025posterarXiv:2507.11060

Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control

Zijie Xu, Tong Bu, Zecheng Hao et al.

NeurIPS 2025posterarXiv:2505.24161
3
citations

Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.

NeurIPS 2025posterarXiv:2505.16394
18
citations

Reasoning as an Adaptive Defense for Safety

Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.

NeurIPS 2025posterarXiv:2507.00971
9
citations

Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

Hanyin Wang, Zhenbang Wu, Gururaj Kolar et al.

NeurIPS 2025spotlightarXiv:2505.21908
3
citations

Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

Zemin Huang, Zhiyang Chen, Zijun Wang et al.

NeurIPS 2025posterarXiv:2505.10446
28
citations

RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation

Tianyi Yan, Wencheng Han, xia zhou et al.

NeurIPS 2025posterarXiv:2509.16500

SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data

Wenkai Fang, Shunyu Liu, Yang Zhou et al.

NeurIPS 2025posterarXiv:2505.20347
19
citations

The Promise of RL for Autoregressive Image Editing

Saba Ahmadi, Rabiul Awal, Ankur Sikarwar et al.

NeurIPS 2025posterarXiv:2508.01119
2
citations

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.

NeurIPS 2025posterarXiv:2506.01347
74
citations

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

Ruilin Luo, Zhuofan Zheng, Lei Wang et al.

NeurIPS 2025posterarXiv:2501.04686
29
citations

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

Ziang Yan, Yinan He, Xinhao Li et al.

NeurIPS 2025oralarXiv:2509.21100
13
citations

WebDancer: Towards Autonomous Information Seeking Agency

Jialong Wu, Baixuan Li, Runnan Fang et al.

NeurIPS 2025posterarXiv:2505.22648
81
citations