Poster "actor-critic algorithms" Papers
3 papers found
$q$-exponential family for policy optimization
Lingwei Zhu, Haseeb Shah, Han Wang et al.
ICLR 2025posterarXiv:2408.07245
Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
Arnav Kumar Jain, Harley Wiltzer, Jesse Farebrother et al.
ICLR 2025posterarXiv:2411.07007
6
citations
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu, Wei Fu, Jiaxuan Gao et al.
ICML 2024poster