"preference-based learning" Papers
3 papers found
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu, Wei Fu, Jiaxuan Gao et al.
ICML 2024poster
PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling
Utsav Singh, Wesley A. Suttle, Brian Sadler et al.
ICML 2024poster
Rating-Based Reinforcement Learning
Devin White, Mingkang Wu, Ellen Novoseller et al.
AAAI 2024paperarXiv:2307.16348
13
citations