"reinforcement learning" Papers

300 papers found • Page 3 of 6

Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback

Johannes Ackermann, Takashi Ishida, Masashi Sugiyama

COLM 2025paperarXiv:2507.15507

Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

Hao Zhong, Muzhi Zhu, Zongze Du et al.

NEURIPS 2025oralarXiv:2505.20256
14
citations

Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning

Weidong Liu, Jiyuan Tu, Xi Chen et al.

NEURIPS 2025arXiv:2310.02581
5
citations

Online Reinforcement Learning in Non-Stationary Context-Driven Environments

Pouya Hamadanian, Arash Nasr-Esfahany, Malte Schwarzkopf et al.

ICLR 2025arXiv:2302.02182
3
citations

Online-to-Offline RL for Agent Alignment

Xu Liu, Haobo Fu, Stefano V. Albrecht et al.

ICLR 2025

On Shallow Planning Under Partial Observability

Randy Lefebvre, Audrey Durand

AAAI 2025paperarXiv:2407.15820
2
citations

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

Marcus Williams, Micah Carroll, Adhyyan Narang et al.

ICLR 2025arXiv:2411.02306
43
citations

On the Convergence of Projected Policy Gradient for Any Constant Step Sizes

Jiacai Liu, Wenye Li, Dachao Lin et al.

NEURIPS 2025arXiv:2311.01104
4
citations

On the Sample Complexity of Differentially Private Policy Optimization

Yi He, Xingyu Zhou

NEURIPS 2025arXiv:2510.21060

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Jingcheng Hu, Yinmin Zhang, Qi Han et al.

NEURIPS 2025arXiv:2503.24290
347
citations

OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles

Yihe Deng, Hritik Bansal, Fan Yin et al.

NEURIPS 2025arXiv:2503.17352
16
citations

Open-World Drone Active Tracking with Goal-Centered Rewards

Haowei Sun, Jinwu Hu, Zhirui Zhang et al.

NEURIPS 2025arXiv:2412.00744
2
citations

Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning

Baiyuan Chen, Shinji Ito, Masaaki Imaizumi

NEURIPS 2025arXiv:2508.16027

OptionZero: Planning with Learned Options

Po-Wei Huang, Pei-Chiun Peng, Hung Guei et al.

ICLR 2025arXiv:2502.16634
2
citations

OrbitZoo: Real Orbital Systems Challenges for Reinforcement Learning

Alexandre Oliveira, Katarina Dyreby, Francisco Caldas et al.

NEURIPS 2025arXiv:2504.04160

Parameter Efficient Fine-tuning via Explained Variance Adaptation

Fabian Paischer, Lukas Hauzenberger, Thomas Schmied et al.

NEURIPS 2025arXiv:2410.07170
6
citations

Pareto Prompt Optimization

Guang Zhao, Byung-Jun Yoon, Gilchan Park et al.

ICLR 2025
1
citations

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems

Christian Walder, Deep Tejas Karkhanis

NEURIPS 2025spotlightarXiv:2505.15201
28
citations

Periodic Skill Discovery

Jonghae Park, Daesol Cho, Jusuk Lee et al.

NEURIPS 2025oralarXiv:2511.03187

Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing

Yilmazcan Ozyurt, Tunaberk Almaci, Stefan Feuerriegel et al.

NEURIPS 2025arXiv:2507.11060

Policy Gradient with Kernel Quadrature

Tetsuro Morimura, Satoshi Hayakawa

ICLR 2025arXiv:2310.14768
1
citations

Preference Distillation via Value based Reinforcement Learning

Minchan Kwon, Junwon Ko, Kangil kim et al.

NEURIPS 2025arXiv:2509.16965

Progress Reward Model for Reinforcement Learning via Large Language Models

Xiuhui Zhang, Ning Gao, Xingyu Jiang et al.

NEURIPS 2025

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Mingjie Liu, Shizhe Diao, Ximing Lu et al.

NEURIPS 2025arXiv:2505.24864
104
citations

Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control

Zijie Xu, Tong Bu, Zecheng Hao et al.

NEURIPS 2025arXiv:2505.24161
4
citations

RAST: Reasoning Activation in LLMs via Small-model Transfer

Siru Ouyang, Xinyu Zhu, Zilin Xiao et al.

NEURIPS 2025arXiv:2506.15710
2
citations

Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.

NEURIPS 2025arXiv:2505.16394
21
citations

ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding

Yiyang Zhou, Yangfan He, Yaofeng Su et al.

NEURIPS 2025arXiv:2506.01300
29
citations

Real-World Reinforcement Learning of Active Perception Behaviors

Edward Hu, Jie Wang, Xingfang Yuan et al.

NEURIPS 2025arXiv:2512.01188

Reasoning as an Adaptive Defense for Safety

Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.

NEURIPS 2025arXiv:2507.00971
11
citations

Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference

Stephen Zhao, Aidan Li, Rob Brekelmans et al.

NEURIPS 2025arXiv:2510.21184

Reinforced Active Learning for Large-Scale Virtual Screening with Learnable Policy Model

Yicong Chen, Jiahua Rao, Jiancong Xie et al.

NEURIPS 2025

Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

Hanyin Wang, Zhenbang Wu, Gururaj Kolar et al.

NEURIPS 2025spotlightarXiv:2505.21908
5
citations

Reinforcement Learning from Imperfect Corrective Actions and Proxy Rewards

Zhaohui JIANG, Xuening Feng, Paul Weng et al.

ICLR 2025arXiv:2410.05782
3
citations

Reinforcement Learning-Guided Data Selection via Redundancy Assessment

Suorong Yang, Peijia Li, Furao Shen et al.

ICCV 2025arXiv:2506.21037
1
citations

Reinforcement learning with combinatorial actions for coupled restless bandits

Lily Xu, Bryan Wilder, Elias Khalil et al.

ICLR 2025arXiv:2503.01919
6
citations

Reinforcement Learning with Imperfect Transition Predictions: A Bellman-Jensen Approach

Chenbei Lu, Zaiwei Chen, Tongxin Li et al.

NEURIPS 2025spotlightarXiv:2510.18687
1
citations

Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

Zemin Huang, Zhiyang Chen, Zijun Wang et al.

NEURIPS 2025arXiv:2505.10446
37
citations

REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization

Huyen Nguyen, Hieu Dam, Nguyen Hoang Khoi Do et al.

AAAI 2025paperarXiv:2501.00779
1
citations

Rendering-Aware Reinforcement Learning for Vector Graphics Generation

Juan Rodriguez, Haotian Zhang, Abhay Puri et al.

NEURIPS 2025arXiv:2505.20793
9
citations

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Mingyang Chen, Linzhuang Sun, Tianpeng Li et al.

NEURIPS 2025arXiv:2503.19470
57
citations

Retro-R1: LLM-based Agentic Retrosynthesis

Wei Liu, Jiangtao Feng, Hongli Yu et al.

NEURIPS 2025

Reverse Engineering Human Preferences with Reinforcement Learning

Lisa Alazraki, Yi-Chern Tan, Jon Ander Campos et al.

NEURIPS 2025spotlightarXiv:2505.15795

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Jorge (Zhoujun) Cheng, Shibo Hao, Tianyang Liu et al.

NEURIPS 2025arXiv:2506.14965
40
citations

REvolve: Reward Evolution with Large Language Models using Human Feedback

RISHI HAZRA, Alkis Sygkounas, Andreas Persson et al.

ICLR 2025arXiv:2406.01309
8
citations

RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation

Tianyi Yan, Wencheng Han, xia zhou et al.

NEURIPS 2025arXiv:2509.16500
4
citations

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Dongyoung Kim, Huiwon Jang, Sumin Park et al.

NEURIPS 2025arXiv:2506.00070
10
citations

Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning

Haozhen Zhang, Tao Feng, Jiaxuan You

NEURIPS 2025arXiv:2506.09033
17
citations

RRO: LLM Agent Optimization Through Rising Reward Trajectories

Zilong Wang, Jingfeng Yang, Sreyashi Nag et al.

COLM 2025paperarXiv:2505.20737
1
citations

Safety Representations for Safer Policy Learning

Kaustubh Mani, Vincent Mai, Charlie Gauthier et al.

ICLR 2025arXiv:2502.20341
1
citations