Poster "kl divergence penalty" Papers
2 papers found
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
Cassidy Laidlaw, Shivam Singhal, Anca Dragan
ICLR 2025posterarXiv:2403.03185
24
citations
Degeneration-free Policy Optimization: RL Fine-Tuning for Language Models without Degeneration
Youngsoo Jang, Geon-Hyeong Kim, Byoungjip Kim et al.
ICML 2024poster