"trust region methods" Papers
4 papers found
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski, Boris Shaposhnikov, Alexey Malakhov et al.
ICLR 2025posterarXiv:2404.09656
46
citations
ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints
Akhil Agnihotri, Rahul Jain, Haipeng Luo
ICML 2024poster
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
Juntao Dai, Yaodong Yang, Qian Zheng et al.
ICML 2024poster
Trust Region Methods for Nonconvex Stochastic Optimization beyond Lipschitz Smoothness
Chenghan Xie, Chenxi Li, Chuwen Zhang et al.
AAAI 2024paperarXiv:2310.17319
13
citations