2025 "off-policy reinforcement learning" Papers
3 papers found
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.
ICLR 2025posterarXiv:2410.18252
39
citations
Off-policy Reinforcement Learning with Model-based Exploration Augmentation
Likun Wang, Xiangteng Zhang, Yinuo Wang et al.
NeurIPS 2025posterarXiv:2510.25529
Succeed or Learn Slowly: Sample Efficient Off-Policy Reinforcement Learning for Mobile App Control
Georgios Papoudakis, Thomas Coste, Jianye Hao et al.
NeurIPS 2025posterarXiv:2509.01720