ICLR Poster "policy optimization" Papers
4 papers found
$q$-exponential family for policy optimization
Lingwei Zhu, Haseeb Shah, Han Wang et al.
ICLR 2025posterarXiv:2408.07245
Optimal Strong Regret and Violation in Constrained MDPs via Policy Optimization
Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi et al.
ICLR 2025posterarXiv:2410.02275
5
citations
RRM: Robust Reward Model Training Mitigates Reward Hacking
Tianqi Liu, Wei Xiong, Jie Ren et al.
ICLR 2025posterarXiv:2409.13156
44
citations
Uncertainty and Influence aware Reward Model Refinement for Reinforcement Learning from Human Feedback
Zexu Sun, Yiju Guo, Yankai Lin et al.
ICLR 2025poster
3
citations