Poster "trust region methods" Papers
3 papers found
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski, Boris Shaposhnikov, Alexey Malakhov et al.
ICLR 2025posterarXiv:2404.09656
46
citations
ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints
Akhil Agnihotri, Rahul Jain, Haipeng Luo
ICML 2024poster
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
Juntao Dai, Yaodong Yang, Qian Zheng et al.
ICML 2024poster