Poster "reward modeling" Papers

16 papers found

Advancing LLM Reasoning Generalists with Preference Trees

Lifan Yuan, Ganqu Cui, Hanbin Wang et al.

ICLR 2025posterarXiv:2404.02078
179
citations

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.

CVPR 2025posterarXiv:2405.13637
21
citations

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.

NeurIPS 2025posterarXiv:2505.17017
25
citations

Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing

Yisong Xiao, Aishan Liu, Siyuan Liang et al.

NeurIPS 2025posterarXiv:2510.01243
2
citations

HelpSteer2-Preference: Complementing Ratings with Preferences

Zhilin Wang, Alexander Bukharin, Olivier Delalleau et al.

ICLR 2025posterarXiv:2410.01257
103
citations

Measuring memorization in RLHF for code completion

Jamie Hayes, I Shumailov, Billy Porter et al.

ICLR 2025posterarXiv:2406.11715
10
citations

PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment

Daiwei Chen, Yi Chen, Aniket Rege et al.

ICLR 2025poster
9
citations

Rethinking Reward Modeling in Preference-based Large Language Model Alignment

Hao Sun, Yunyi Shen, Jean-Francois Ton

ICLR 2025poster

Uncertainty and Influence aware Reward Model Refinement for Reinforcement Learning from Human Feedback

Zexu Sun, Yiju Guo, Yankai Lin et al.

ICLR 2025poster
3
citations

Variational Best-of-N Alignment

Afra Amini, Tim Vieira, Elliott Ash et al.

ICLR 2025posterarXiv:2407.06057
37
citations

VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation

Kuo-Han Hung, Pang-Chi Lo, Jia-Fong Yeh et al.

ICLR 2025posterarXiv:2405.16545
2
citations

Efficient Exploration for LLMs

Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao et al.

ICML 2024poster

HarmonyDream: Task Harmonization Inside World Models

Haoyu Ma, Jialong Wu, Ningya Feng et al.

ICML 2024poster

Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation

JoonHo Lee, Jae Oh Woo, Juree Seok et al.

ICML 2024poster

Stealthy Imitation: Reward-guided Environment-free Policy Stealing

Zhixiong Zhuang, Irina Nicolae, Mario Fritz

ICML 2024poster

Token-level Direct Preference Optimization

Yongcheng Zeng, Guoqing Liu, Weiyu Ma et al.

ICML 2024poster