2025 "reward modeling" Papers
9 papers found
Advancing LLM Reasoning Generalists with Preference Trees
Lifan Yuan, Ganqu Cui, Hanbin Wang et al.
ICLR 2025posterarXiv:2404.02078
179
citations
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.
NeurIPS 2025posterarXiv:2505.17017
25
citations
Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing
Yisong Xiao, Aishan Liu, Siyuan Liang et al.
NeurIPS 2025posterarXiv:2510.01243
2
citations
HelpSteer2-Preference: Complementing Ratings with Preferences
Zhilin Wang, Alexander Bukharin, Olivier Delalleau et al.
ICLR 2025posterarXiv:2410.01257
103
citations
Measuring memorization in RLHF for code completion
Jamie Hayes, I Shumailov, Billy Porter et al.
ICLR 2025posterarXiv:2406.11715
10
citations
PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment
Daiwei Chen, Yi Chen, Aniket Rege et al.
ICLR 2025poster
9
citations
Rethinking Reward Modeling in Preference-based Large Language Model Alignment
Hao Sun, Yunyi Shen, Jean-Francois Ton
ICLR 2025poster
Variational Best-of-N Alignment
Afra Amini, Tim Vieira, Elliott Ash et al.
ICLR 2025posterarXiv:2407.06057
37
citations
VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation
Kuo-Han Hung, Pang-Chi Lo, Jia-Fong Yeh et al.
ICLR 2025posterarXiv:2405.16545
2
citations