ICML Poster "proximal policy optimization" Papers
3 papers found
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu, Wei Fu, Jiaxuan Gao et al.
ICML 2024posterarXiv:2404.10719
Reflective Policy Optimization
Yaozhong Gan, yan renye, zhe wu et al.
ICML 2024posterarXiv:2406.03678
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li, Tian Xu, Yushun Zhang et al.
ICML 2024posterarXiv:2310.10505