"reward modeling" Papers
38 papers found
Conference
Advancing LLM Reasoning Generalists with Preference Trees
Lifan Yuan, Ganqu Cui, Hanbin Wang et al.
Aligning Language Models Using Follow-up Likelihood as Reward Signal
Chen Zhang, Dading Chong, Feng Jiang et al.
Approximated Variational Bayesian Inverse Reinforcement Learning for Large Language Model Alignment
Yuang Cai, Yuyu Yuan, Jinsheng Shi et al.
Checklists Are Better Than Reward Models For Aligning Language Models
Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Hyojin Bahng, Caroline Chan, Fredo Durand et al.
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.
Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing
Yisong Xiao, Aishan Liu, Siyuan Liang et al.
HelpSteer2-Preference: Complementing Ratings with Preferences
Zhilin Wang, Alexander Bukharin, Olivier Delalleau et al.
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
Avinandan Bose, Zhihan Xiong, Yuejie Chi et al.
MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models
Yujing Wang, Hainan Zhang, Liang Pang et al.
Measuring memorization in RLHF for code completion
Jamie Hayes, I Shumailov, Billy Porter et al.
MJ-Video: Benchmarking and Rewarding Video Generation with Fine-Grained Video Preference
Haibo Tong, Zhaoyang Wang, Zhaorun Chen et al.
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
Shijie Zhou, Ruiyi Zhang, Huaisheng Zhu et al.
Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback
Johannes Ackermann, Takashi Ishida, Masashi Sugiyama
Online-to-Offline RL for Agent Alignment
Xu Liu, Haobo Fu, Stefano V. Albrecht et al.
PAD: Personalized Alignment of LLMs at Decoding-time
Ruizhe Chen, Xiaotian Zhang, Meng Luo et al.
PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment
Daiwei Chen, Yi Chen, Aniket Rege et al.
Preference Distillation via Value based Reinforcement Learning
Minchan Kwon, Junwon Ko, Kangil kim et al.
Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward
Dipendra Misra, Aldo Pacchiano, Ta-Chung Chi et al.
Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions
Simon Matrenok, Skander Moalla, Caglar Gulcehre
ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning
Timo Kaufmann, Yannick Metz, Daniel Keim et al.
Rethinking Reward Modeling in Preference-based Large Language Model Alignment
Hao Sun, Yunyi Shen, Jean-Francois Ton
Reverse Engineering Human Preferences with Reinforcement Learning
Lisa Alazraki, Yi-Chern Tan, Jon Ander Campos et al.
Reward Learning from Multiple Feedback Types
Yannick Metz, Andras Geiszl, Raphaël Baur et al.
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models
Ángela López-Cardona, Carlos Segura, Alexandros Karatzoglou et al.
Selftok-Zero: Reinforcement Learning for Visual Generation via Discrete and Autoregressive Visual Tokens
Bohan Wang, Mingze Zhou, Zhongqi Yue et al.
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
Han Xiao, Guozhi Wang, Yuxiang Chai et al.
Uncertainty and Influence aware Reward Model Refinement for Reinforcement Learning from Human Feedback
Zexu Sun, Yiju Guo, Yankai Lin et al.
Variational Best-of-N Alignment
Afra Amini, Tim Vieira, Elliott Ash et al.
VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation
Kuo-Han Hung, Pang-Chi Lo, Jia-Fong Yeh et al.
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal, Jihan Yin, Erhan Bas
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Yanda Chen, Ruiqi Zhong, Narutatsu Ri et al.
Efficient Exploration for LLMs
Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao et al.
HarmonyDream: Task Harmonization Inside World Models
Haoyu Ma, Jialong Wu, Ningya Feng et al.
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
JoonHo Lee, Jae Oh Woo, Juree Seok et al.
Stealthy Imitation: Reward-guided Environment-free Policy Stealing
Zhixiong Zhuang, Irina Nicolae, Mario Fritz
Token-level Direct Preference Optimization
Yongcheng Zeng, Guoqing Liu, Weiyu Ma et al.