2025 "reinforcement learning" Papers
131 papers found • Page 2 of 3
LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement Learning
Zhuorui Ye, Stephanie Milani, Geoff Gordon et al.
LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models
Qianyue Hao, Yiwen Song, Qingmin Liao et al.
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
Haoxian Chen, Hanyang Zhao, Henry Lam et al.
Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning
Gunshi Gupta, Karmesh Yadav, Zsolt Kira et al.
Meta-learning how to Share Credit among Macro-Actions
Ionel-Alexandru Hosu, Traian Rebedea, Razvan Pascanu
MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility
Wayne Wu, Honglin He, Jack He et al.
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Xi Chen, Mingkang Zhu, Shaoteng Liu et al.
Modelling the control of offline processing with reinforcement learning
Eleanor Spens, Neil Burgess, Tim Behrens
MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization
Chenglong Wang, Yang Gan, Hang Zhou et al.
Multi-Agent Collaboration via Evolving Orchestration
Yufan Dang, Chen Qian, Xueheng Luo et al.
Neuroplastic Expansion in Deep Reinforcement Learning
Jiashun Liu, Johan S Obando Ceron, Aaron Courville et al.
NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation
Longtian Qiu, Shan Ning, Jiaxuan Sun et al.
No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes
Jasmine Bayrooti, Sattar Vakili, Amanda Prorok et al.
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
Hao Zhong, Muzhi Zhu, Zongze Du et al.
Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
Weidong Liu, Jiyuan Tu, Xi Chen et al.
Online-to-Offline RL for Agent Alignment
Xu Liu, Haobo Fu, Stefano V. Albrecht et al.
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams, Micah Carroll, Adhyyan Narang et al.
On the Convergence of Projected Policy Gradient for Any Constant Step Sizes
Jiacai Liu, Wenye Li, Dachao Lin et al.
On the Sample Complexity of Differentially Private Policy Optimization
Yi He, Xingyu Zhou
OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles
Yihe Deng, Hritik Bansal, Fan Yin et al.
Open-World Drone Active Tracking with Goal-Centered Rewards
Haowei Sun, Jinwu Hu, Zhirui Zhang et al.
Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning
Baiyuan Chen, Shinji Ito, Masaaki Imaizumi
OptionZero: Planning with Learned Options
Po-Wei Huang, Pei-Chiun Peng, Hung Guei et al.
OrbitZoo: Real Orbital Systems Challenges for Reinforcement Learning
Alexandre Oliveira, Katarina Dyreby, Francisco Caldas et al.
Parameter Efficient Fine-tuning via Explained Variance Adaptation
Fabian Paischer, Lukas Hauzenberger, Thomas Schmied et al.
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
Christian Walder, Deep Tejas Karkhanis
Periodic Skill Discovery
Jonghae Park, Daesol Cho, Jusuk Lee et al.
Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing
Yilmazcan Ozyurt, Tunaberk Almaci, Stefan Feuerriegel et al.
Policy Gradient with Kernel Quadrature
Tetsuro Morimura, Satoshi Hayakawa
Progress Reward Model for Reinforcement Learning via Large Language Models
Xiuhui Zhang, Ning Gao, Xingyu Jiang et al.
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Mingjie Liu, Shizhe Diao, Ximing Lu et al.
Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control
Zijie Xu, Tong Bu, Zecheng Hao et al.
Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)
Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.
ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding
Yiyang Zhou, Yangfan He, Yaofeng Su et al.
Real-World Reinforcement Learning of Active Perception Behaviors
Edward Hu, Jie Wang, Xingfang Yuan et al.
Reasoning as an Adaptive Defense for Safety
Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.
Reinforced Active Learning for Large-Scale Virtual Screening with Learnable Policy Model
Yicong Chen, Jiahua Rao, Jiancong Xie et al.
Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding
Hanyin Wang, Zhenbang Wu, Gururaj Kolar et al.
Reinforcement Learning-Guided Data Selection via Redundancy Assessment
Suorong Yang, Peijia Li, Furao Shen et al.
Reinforcement learning with combinatorial actions for coupled restless bandits
Lily Xu, Bryan Wilder, Elias Khalil et al.
Reinforcement Learning with Imperfect Transition Predictions: A Bellman-Jensen Approach
Chenbei Lu, Zaiwei Chen, Tongxin Li et al.
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Zemin Huang, Zhiyang Chen, Zijun Wang et al.
Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Juan Rodriguez, Haotian Zhang, Abhay Puri et al.
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Mingyang Chen, Linzhuang Sun, Tianpeng Li et al.
Retro-R1: LLM-based Agentic Retrosynthesis
Wei Liu, Jiangtao Feng, Hongli Yu et al.
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
Jorge (Zhoujun) Cheng, Shibo Hao, Tianyang Liu et al.
REvolve: Reward Evolution with Large Language Models using Human Feedback
RISHI HAZRA, Alkis Sygkounas, Andreas Persson et al.
RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation
Tianyi Yan, Wencheng Han, xia zhou et al.
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim, Huiwon Jang, Sumin Park et al.
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning
Haozhen Zhang, Tao Feng, Jiaxuan You