2024 Poster "preference optimization" Papers
7 papers found
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy, Christoph Dann, Rahul Kidambi et al.
ICML 2024poster
Can AI Assistants Know What They Don't Know?
Qinyuan Cheng, Tianxiang Sun, Xiangyang Liu et al.
ICML 2024posterarXiv:2401.13275
Generalized Preference Optimization: A Unified Approach to Offline Alignment
Yunhao Tang, Zhaohan Guo, Zeyu Zheng et al.
ICML 2024posterarXiv:2402.05749
Human Alignment of Large Language Models through Online Preference Optimisation
Daniele Calandriello, Zhaohan Guo, REMI MUNOS et al.
ICML 2024poster
Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models
Songtao Liu, Hanjun Dai, Yue Zhao et al.
ICML 2024poster
RLVF: Learning from Verbal Feedback without Overgeneralization
Moritz Stephan, Alexander Khazatsky, Eric Mitchell et al.
ICML 2024poster
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Zixiang Chen, Yihe Deng, Huizhuo Yuan et al.
ICML 2024poster