Poster "rlhf" Papers
2 papers found
HelpSteer2-Preference: Complementing Ratings with Preferences
Zhilin Wang, Alexander Bukharin, Olivier Delalleau et al.
ICLR 2025posterarXiv:2410.01257
103
citations
Transforming and Combining Rewards for Aligning Large Language Models
Zihao Wang, Chirag Nagpal, Jonathan Berant et al.
ICML 2024posterarXiv:2402.00742