"reward modeling" Papers

38 papers found

Advancing LLM Reasoning Generalists with Preference Trees

Lifan Yuan, Ganqu Cui, Hanbin Wang et al.

ICLR 2025arXiv:2404.02078
183
citations

Aligning Language Models Using Follow-up Likelihood as Reward Signal

Chen Zhang, Dading Chong, Feng Jiang et al.

AAAI 2025paperarXiv:2409.13948
6
citations

Approximated Variational Bayesian Inverse Reinforcement Learning for Large Language Model Alignment

Yuang Cai, Yuyu Yuan, Jinsheng Shi et al.

AAAI 2025paperarXiv:2411.09341
4
citations

Checklists Are Better Than Reward Models For Aligning Language Models

Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.

NEURIPS 2025spotlightarXiv:2507.18624
32
citations

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.

CVPR 2025arXiv:2405.13637
24
citations

Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences

Hyojin Bahng, Caroline Chan, Fredo Durand et al.

ICCV 2025arXiv:2506.02095
7
citations

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.

NEURIPS 2025arXiv:2505.17017
26
citations

Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing

Yisong Xiao, Aishan Liu, Siyuan Liang et al.

NEURIPS 2025arXiv:2510.01243
2
citations

HelpSteer2-Preference: Complementing Ratings with Preferences

Zhilin Wang, Alexander Bukharin, Olivier Delalleau et al.

ICLR 2025arXiv:2410.01257
111
citations

LoRe: Personalizing LLMs via Low-Rank Reward Modeling

Avinandan Bose, Zhihan Xiong, Yuejie Chi et al.

COLM 2025paperarXiv:2504.14439
10
citations

MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models

Yujing Wang, Hainan Zhang, Liang Pang et al.

AAAI 2025paperarXiv:2408.17072
8
citations

Measuring memorization in RLHF for code completion

Jamie Hayes, I Shumailov, Billy Porter et al.

ICLR 2025arXiv:2406.11715
10
citations

MJ-Video: Benchmarking and Rewarding Video Generation with Fine-Grained Video Preference

Haibo Tong, Zhaoyang Wang, Zhaorun Chen et al.

NEURIPS 2025spotlight

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation

Shijie Zhou, Ruiyi Zhang, Huaisheng Zhu et al.

ICCV 2025arXiv:2507.21391
7
citations

Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback

Johannes Ackermann, Takashi Ishida, Masashi Sugiyama

COLM 2025paperarXiv:2507.15507

Online-to-Offline RL for Agent Alignment

Xu Liu, Haobo Fu, Stefano V. Albrecht et al.

ICLR 2025

PAD: Personalized Alignment of LLMs at Decoding-time

Ruizhe Chen, Xiaotian Zhang, Meng Luo et al.

ICLR 2025arXiv:2410.04070
36
citations

PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment

Daiwei Chen, Yi Chen, Aniket Rege et al.

ICLR 2025
9
citations

Preference Distillation via Value based Reinforcement Learning

Minchan Kwon, Junwon Ko, Kangil kim et al.

NEURIPS 2025arXiv:2509.16965

Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward

Dipendra Misra, Aldo Pacchiano, Ta-Chung Chi et al.

NEURIPS 2025arXiv:2601.19055

Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions

Simon Matrenok, Skander Moalla, Caglar Gulcehre

NEURIPS 2025arXiv:2507.08068
1
citations

ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning

Timo Kaufmann, Yannick Metz, Daniel Keim et al.

NEURIPS 2025arXiv:2512.25023

Rethinking Reward Modeling in Preference-based Large Language Model Alignment

Hao Sun, Yunyi Shen, Jean-Francois Ton

ICLR 2025
27
citations

Reverse Engineering Human Preferences with Reinforcement Learning

Lisa Alazraki, Yi-Chern Tan, Jon Ander Campos et al.

NEURIPS 2025spotlightarXiv:2505.15795

Reward Learning from Multiple Feedback Types

Yannick Metz, Andras Geiszl, Raphaël Baur et al.

ICLR 2025arXiv:2502.21038
5
citations

Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models

Ángela López-Cardona, Carlos Segura, Alexandros Karatzoglou et al.

ICLR 2025arXiv:2410.01532
8
citations

Selftok-Zero: Reinforcement Learning for Visual Generation via Discrete and Autoregressive Visual Tokens

Bohan Wang, Mingze Zhou, Zhongqi Yue et al.

NEURIPS 2025

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Han Xiao, Guozhi Wang, Yuxiang Chai et al.

NEURIPS 2025arXiv:2505.21496
12
citations

Uncertainty and Influence aware Reward Model Refinement for Reinforcement Learning from Human Feedback

Zexu Sun, Yiju Guo, Yankai Lin et al.

ICLR 2025
5
citations

Variational Best-of-N Alignment

Afra Amini, Tim Vieira, Elliott Ash et al.

ICLR 2025arXiv:2407.06057
38
citations

VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation

Kuo-Han Hung, Pang-Chi Lo, Jia-Fong Yeh et al.

ICLR 2025arXiv:2405.16545
2
citations

Detecting and Preventing Hallucinations in Large Vision Language Models

Anisha Gunjal, Jihan Yin, Erhan Bas

AAAI 2024paperarXiv:2308.06394
263
citations

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

Yanda Chen, Ruiqi Zhong, Narutatsu Ri et al.

ICML 2024spotlightarXiv:2307.08678
79
citations

Efficient Exploration for LLMs

Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao et al.

ICML 2024arXiv:2402.00396
37
citations

HarmonyDream: Task Harmonization Inside World Models

Haoyu Ma, Jialong Wu, Ningya Feng et al.

ICML 2024arXiv:2310.00344
18
citations

Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation

JoonHo Lee, Jae Oh Woo, Juree Seok et al.

ICML 2024arXiv:2405.06424
3
citations

Stealthy Imitation: Reward-guided Environment-free Policy Stealing

Zhixiong Zhuang, Irina Nicolae, Mario Fritz

ICML 2024arXiv:2405.07004
2
citations

Token-level Direct Preference Optimization

Yongcheng Zeng, Guoqing Liu, Weiyu Ma et al.

ICML 2024arXiv:2404.11999
120
citations