Paper "reinforcement learning from human feedback" Papers
4 papers found
Learning Optimal Advantage from Preferences and Mistaking It for Reward
W Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson et al.
AAAI 2024paperarXiv:2310.02456
15
citations
Preference Ranking Optimization for Human Alignment
Feifan Song, Bowen Yu, Minghao Li et al.
AAAI 2024paperarXiv:2306.17492
334
citations
Underspecification in Language Modeling Tasks: A Causality-Informed Study of Gendered Pronoun Resolution
Emily McMilin
AAAI 2024paperarXiv:2210.00131
Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-World Multi-Turn Dialogue
Songhua Yang, Hanjie Zhao, Senbin Zhu et al.
AAAI 2024paperarXiv:2308.03549
204
citations