Poster "preference optimization" Papers
52 papers found • Page 2 of 2
Conference
RLVF: Learning from Verbal Feedback without Overgeneralization
Moritz Stephan, Alexander Khazatsky, Eric Mitchell et al.
ICML 2024arXiv:2402.10893
14
citations
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Zixiang Chen, Yihe Deng, Huizhuo Yuan et al.
ICML 2024arXiv:2401.01335
480
citations