ICLR Poster "off-policy reinforcement learning" Papers
2 papers found
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.
ICLR 2025posterarXiv:2410.18252
39
citations
Zero-shot Model-based Reinforcement Learning using Large Language Models
Abdelhakim Benechehab, Youssef Attia El Hili, Ambroise Odonnat et al.
ICLR 2025posterarXiv:2410.11711