NeurIPS "reinforcement learning" Papers

29 papers found

ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition

Daolang Huang, Xinyi Wen, Ayush Bharti et al.

NeurIPS 2025spotlightarXiv:2506.07259
2
citations

AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws

Oren Neumann, Claudius Gros

NeurIPS 2025spotlightarXiv:2412.11979
9
citations

Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation

Feichen Gan, Lu Youcun, Yingying Zhang et al.

NeurIPS 2025oralarXiv:2510.26026

Continual Knowledge Adaptation for Reinforcement Learning

Jinwu Hu, ZiHao Lian, Zhiquan Wen et al.

NeurIPS 2025posterarXiv:2510.19314
1
citations

CURE: Co-Evolving Coders and Unit Testers via Reinforcement Learning

Yinjie Wang, Ling Yang, Ye Tian et al.

NeurIPS 2025spotlight

From Kolmogorov to Cauchy: Shallow XNet Surpasses KANs

Xin Li, Xiaotao Zheng, Zhihong Xia

NeurIPS 2025poster

GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining

Chunyu Wei, Wenji Hu, Xingjia Hao et al.

NeurIPS 2025posterarXiv:2511.00457

Heterogeneous Graph Transformers for Simultaneous Mobile Multi-Robot Task Allocation and Scheduling under Temporal Constraints

Batuhan Altundas, Shengkang Chen, Shivika Singh et al.

NeurIPS 2025oral

Meta-learning how to Share Credit among Macro-Actions

Ionel-Alexandru Hosu, Traian Rebedea, Razvan Pascanu

NeurIPS 2025oralarXiv:2506.13690

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Xi Chen, Mingkang Zhu, Shaoteng Liu et al.

NeurIPS 2025posterarXiv:2506.22434

MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization

Chenglong Wang, Yang Gan, Hang Zhou et al.

NeurIPS 2025posterarXiv:2510.21473

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NeurIPS 2025posterarXiv:2505.19591
25
citations

On the Convergence of Projected Policy Gradient for Any Constant Step Sizes

Jiacai Liu, Wenye Li, Dachao Lin et al.

NeurIPS 2025posterarXiv:2311.01104
4
citations

Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning

Baiyuan Chen, Shinji Ito, Masaaki Imaizumi

NeurIPS 2025posterarXiv:2508.16027

OrbitZoo: Real Orbital Systems Challenges for Reinforcement Learning

Alexandre Oliveira, Katarina Dyreby, Francisco Caldas et al.

NeurIPS 2025posterarXiv:2504.04160

Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing

Yilmazcan Ozyurt, Tunaberk Almaci, Stefan Feuerriegel et al.

NeurIPS 2025posterarXiv:2507.11060

Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control

Zijie Xu, Tong Bu, Zecheng Hao et al.

NeurIPS 2025posterarXiv:2505.24161
3
citations

Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.

NeurIPS 2025posterarXiv:2505.16394
18
citations

Reasoning as an Adaptive Defense for Safety

Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.

NeurIPS 2025posterarXiv:2507.00971
9
citations

Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

Hanyin Wang, Zhenbang Wu, Gururaj Kolar et al.

NeurIPS 2025spotlightarXiv:2505.21908
3
citations

Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

Zemin Huang, Zhiyang Chen, Zijun Wang et al.

NeurIPS 2025posterarXiv:2505.10446
28
citations

RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation

Tianyi Yan, Wencheng Han, xia zhou et al.

NeurIPS 2025posterarXiv:2509.16500

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Dongyoung Kim, Huiwon Jang, Sumin Park et al.

NeurIPS 2025posterarXiv:2506.00070
9
citations

SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data

Wenkai Fang, Shunyu Liu, Yang Zhou et al.

NeurIPS 2025posterarXiv:2505.20347
19
citations

The Promise of RL for Autoregressive Image Editing

Saba Ahmadi, Rabiul Awal, Ankur Sikarwar et al.

NeurIPS 2025posterarXiv:2508.01119
2
citations

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.

NeurIPS 2025posterarXiv:2506.01347
74
citations

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

Ruilin Luo, Zhuofan Zheng, Lei Wang et al.

NeurIPS 2025posterarXiv:2501.04686
29
citations

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

Ziang Yan, Yinan He, Xinhao Li et al.

NeurIPS 2025oralarXiv:2509.21100
13
citations

WebDancer: Towards Autonomous Information Seeking Agency

Jialong Wu, Baixuan Li, Runnan Fang et al.

NeurIPS 2025posterarXiv:2505.22648
81
citations