Deep Reinforcement Learning
Deep learning for RL
Related Topics (Reinforcement Learning)
Top Papers
Understanding the Effects of RLHF on LLM Generalisation and Diversity
Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis et al.
Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng, Kaixiong Gong, Bohao Li et al.
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang, Chao Qu, Zuming Huang et al.
ToolRL: Reward is All Tool Learning Needs
Cheng Qian, Emre Can Acikgoz, Qi He et al.
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
Juan Rocamonde, Victoriano Montesinos, Elvis Nava et al.
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo, Kaiyan Zhang, Li Sheng et al.
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Zehan Qi, Xiao Liu, Iat Long Iong et al.
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Mingjie Liu, Shizhe Diao, Ximing Lu et al.
General-Reasoner: Advancing LLM Reasoning Across All Domains
Xueguang Ma, Qian Liu, Dongfu Jiang et al.
Learning to Act without Actions
Dominik Schmidt, Minqi Jiang
Large-scale Reinforcement Learning for Diffusion Models
Yinan Zhang, Eric Tzeng, Yilun Du et al.
METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
Seohong Park, Oleh Rybkin, Sergey Levine
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Marwa Abdulhai, Isadora White, Charlie Snell et al.
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
En Yu, Kangheng Lin, Liang Zhao et al.
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Mingyang Chen, Linzhuang Sun, Tianpeng Li et al.
Simplifying Deep Temporal Difference Learning
Matteo Gallici, Mattie Fellows, Benjamin Ellis et al.
Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX
Clément Bonnet, Daniel Luo, Donal Byrne et al.
RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
Hao Gao, Shaoyu Chen, Bo Jiang et al.
Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-v2)
Qifeng Li, Xiaosong Jia, Shaobo Wang et al.
Dual RL: Unification and New Methods for Reinforcement and Imitation Learning
Harshit Sikchi, Qinqing Zheng, Amy Zhang et al.
Provable Offline Preference-Based Reinforcement Learning
Wenhao Zhan, Masatoshi Uehara, Nathan Kallus et al.
SafeDreamer: Safe Reinforcement Learning with World Models
Weidong Huang, Jiaming Ji, Chunhe Xia et al.
CPPO: Continual Learning for Reinforcement Learning with Human Feedback
Han Zhang, Yu Lei, Lin Gui et al.
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
Zhiyuan Zhou, Andy Peng, Qiyang Li et al.
BadRL: Sparse Targeted Backdoor Attack against Reinforcement Learning
Jing Cui, Yufei Han, Yuzhe Ma et al.
Entity-Centric Reinforcement Learning for Object Manipulation from Pixels
Dan Haramati, Tal Daniel, Aviv Tamar
Grounded Reinforcement Learning for Visual Reasoning
Gabriel Sarch, Snigdha Saha, Naitik Khandelwal et al.
Efficient Online Reinforcement Learning for Diffusion Policy
Haitong Ma, Tianyi Chen, Kai Wang et al.
Pre-Training Goal-based Models for Sample-Efficient Reinforcement Learning
Haoqi Yuan, Zhancun Mu, Feiyang Xie et al.
Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning
Yun Qu, Yuhang Jiang, Boyuan Wang et al.
Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving
Zhenghao Peng, Wenjie Luo, Yiren Lu et al.
ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
Hongyin Zhang, Zifeng Zhuang, Han Zhao et al.
Exploring the Promise and Limits of Real-Time Recurrent Learning
Kazuki Irie, Anand Gopalakrishnan, Jürgen Schmidhuber
Efficient Reinforcement Learning with Large Language Model Priors
Xue Yan, Yan Song, Xidong Feng et al.
SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data
Wenkai Fang, Shunyu Liu, Yang Zhou et al.
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
Zijing Hu, Fengda Zhang, Long Chen et al.
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Bhavya, Stelian Coros, Andreas Krause et al.
SELF-EVOLVED REWARD LEARNING FOR LLMS
Chenghua Huang, Zhizhen Fan, Lu Wang et al.
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Sagnik Mukherjee, Lifan Yuan, Dilek Hakkani-Tur et al.
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Xiaoyuan Liu, Tian Liang, Zhiwei He et al.
Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning
Jiuqi Wang, Ethan Blaser, Hadi Daneshmand et al.
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao, Wenhao Zhan, Jonathan Chang et al.
Reinforcement Learning Friendly Vision-Language Model for Minecraft
Haobin Jiang, Junpeng Yue, Hao Luo et al.
Rating-Based Reinforcement Learning
Devin White, Mingkang Wu, Ellen Novoseller et al.
Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
Charles Arnal, Gaëtan Narozniak, Vivien Cabannes et al.
AdaWM: Adaptive World Model based Planning for Autonomous Driving
Hang Wang, Xin Ye, Feng Tao et al.
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
Hao Zhong, Muzhi Zhu, Zongze Du et al.
Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition
Chuanguang Yang, XinQiang Yu, Han Yang et al.
RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors
Fengshuo Bai, Runze Liu, Yali Du et al.
Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Yongyuan Liang, Yanchao Sun, Ruijie Zheng et al.
Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data
Chongyi Zheng, Benjamin Eysenbach, Homer Walke et al.
Dynamic Layer Tying for Parameter-Efficient Transformers
Tamir David-Hay, Lior Wolf
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Alexander Nikulin, Ilya Zisman, Alexey Zemtsov et al.
UNEX-RL: Reinforcing Long-Term Rewards in Multi-Stage Recommender Systems with UNidirectional EXecution
Gengrui Zhang, Xiaoshuang Chen, Yao WANG et al.
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Eliot Xing, Vernon Luk, Jean Oh
Not All Tasks Are Equally Difficult: Multi-Task Deep Reinforcement Learning with Dynamic Depth Routing
Jinmin He, Kai Li, Yifan Zang et al.
Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds
Zhiyong Wang, Dongruo Zhou, John C.S. Lui et al.
Open-World Reinforcement Learning over Long Short-Term Imagination
Jiajian Li, Qi Wang, Yunbo Wang et al.
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
Claas Voelcker, Marcel Hussing, ERIC EATON et al.
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
Tonghe Zhang, Chao Yu, Sichang Su et al.
Massively Scalable Inverse Reinforcement Learning in Google Maps
Matt Barnes, Matthew Abueg, Oliver Lange et al.
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied, Thomas Adler, Vihang Patil et al.
Learning from Sparse Offline Datasets via Conservative Density Estimation
Zhepeng Cen, Zuxin Liu, Zitong Wang et al.
Unlocking the Power of Representations in Long-term Novelty-based Exploration
Alaa Saade, Steven Kapturowski, Daniele Calandriello et al.
Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning
Patrick Yin, Tyler Westenbroek, Ching-An Cheng et al.
Prioritized Generative Replay
Ren Wang, Kevin Frans, Pieter Abbeel et al.
Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning
Zizhao Wang, Caroline Wang, Xuesu Xiao et al.
A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals
Grace Liu, Michael Tang, Benjamin Eysenbach
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim, Huiwon Jang, Sumin Park et al.
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang, Lei Ying
SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores
Zhiyu Mei, Wei Fu, Jiaxuan Gao et al.
DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing
Vint Lee, Pieter Abbeel, Youngwoon Lee
Flow-Based Policy for Online Reinforcement Learning
Lei Lv, Yunfei Li, Yu Luo et al.
Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization
Zongkai Liu, Qian Lin, Chao Yu et al.
Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning
Yinglun Xu, Qi Zeng, Gagandeep Singh
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Joey Hong, Anca Dragan, Sergey Levine
REvolve: Reward Evolution with Large Language Models using Human Feedback
RISHI HAZRA, Alkis Sygkounas, Andreas Persson et al.
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Jie Cheng, Ruixi Qiao, ma yingwei et al.
Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models
Minh-Tung Luu, Younghwan Lee, Donghoon Lee et al.
Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration
Max Wilcoxson, Qiyang Li, Kevin Frans et al.
Identifying Policy Gradient Subspaces
Jan Schneider, Pierre Schumacher, Simon Guist et al.
Causally Aligned Curriculum Learning
Mingxuan Li, Junzhe Zhang, Elias Bareinboim
DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks
Tongzhou Mu, Minghua Liu, Hao Su
Beyond Verifiable Rewards: Scaling Reinforcement Learning in Language Models to Unverifiable Data
Yunhao Tang, Sid Wang, Lovish Madaan et al.
Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning
Jaehyeon Son, Soochan Lee, Gunhee Kim
Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
Arnav Kumar Jain, Harley Wiltzer, Jesse Farebrother et al.
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
Michal Nauman, Marek Cygan, Carmelo Sferrazza et al.
Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning
Guozheng Ma, Lu Li, Zilin Wang et al.
EARL-BO: Reinforcement Learning for Multi-Step Lookahead, High-Dimensional Bayesian Optimization
Mujin Cheon, Jay Lee, Dong-Yeun Koh et al.
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning
Hung Le, Dung Nguyen, Kien Do et al.
Are Expressive Models Truly Necessary for Offline RL?
Guan Wang, Haoyi Niu, Jianxiong Li et al.
Inverse Reinforcement Learning by Estimating Expertise of Demonstrators
Mark Beliaev, Ramtin Pedarsani
CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning
Jiangpeng He, Zhihao Duan, Fengqing Zhu
ReNeg: Learning Negative Embedding with Reward Guidance
Xiaomin Li, yixuan liu, Takashi Isobe et al.
Embedding Safety into RL: A New Take on Trust Region Methods
Nikola Milosevic, Johannes Müller, Nico Scherf
REValueD: Regularised Ensemble Value-Decomposition for Factorisable Markov Decision Processes
David Ireland, Giovanni Montana
Hybrid Latent Reasoning via Reinforcement Learning
Zhenrui Yue, Bowen Jin, Huimin Zeng et al.
SeRA: Self-Reviewing and Alignment of LLMs using Implicit Reward Margins
Jongwoo Ko, Saket Dingliwal, Bhavana Ganesh et al.
Reinforcement learning with combinatorial actions for coupled restless bandits
Lily Xu, Bryan Wilder, Elias Khalil et al.
Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning
Kwanyoung Park, Youngwoon Lee