Poster "reward modeling" Papers
28 papers found
Conference
Advancing LLM Reasoning Generalists with Preference Trees
Lifan Yuan, Ganqu Cui, Hanbin Wang et al.
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Hyojin Bahng, Caroline Chan, Fredo Durand et al.
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.
Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing
Yisong Xiao, Aishan Liu, Siyuan Liang et al.
HelpSteer2-Preference: Complementing Ratings with Preferences
Zhilin Wang, Alexander Bukharin, Olivier Delalleau et al.
Measuring memorization in RLHF for code completion
Jamie Hayes, I Shumailov, Billy Porter et al.
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
Shijie Zhou, Ruiyi Zhang, Huaisheng Zhu et al.
Online-to-Offline RL for Agent Alignment
Xu Liu, Haobo Fu, Stefano V. Albrecht et al.
PAD: Personalized Alignment of LLMs at Decoding-time
Ruizhe Chen, Xiaotian Zhang, Meng Luo et al.
PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment
Daiwei Chen, Yi Chen, Aniket Rege et al.
Preference Distillation via Value based Reinforcement Learning
Minchan Kwon, Junwon Ko, Kangil kim et al.
Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward
Dipendra Misra, Aldo Pacchiano, Ta-Chung Chi et al.
Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions
Simon Matrenok, Skander Moalla, Caglar Gulcehre
ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning
Timo Kaufmann, Yannick Metz, Daniel Keim et al.
Rethinking Reward Modeling in Preference-based Large Language Model Alignment
Hao Sun, Yunyi Shen, Jean-Francois Ton
Reward Learning from Multiple Feedback Types
Yannick Metz, Andras Geiszl, Raphaël Baur et al.
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models
Ángela López-Cardona, Carlos Segura, Alexandros Karatzoglou et al.
Selftok-Zero: Reinforcement Learning for Visual Generation via Discrete and Autoregressive Visual Tokens
Bohan Wang, Mingze Zhou, Zhongqi Yue et al.
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
Han Xiao, Guozhi Wang, Yuxiang Chai et al.
Uncertainty and Influence aware Reward Model Refinement for Reinforcement Learning from Human Feedback
Zexu Sun, Yiju Guo, Yankai Lin et al.
Variational Best-of-N Alignment
Afra Amini, Tim Vieira, Elliott Ash et al.
VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation
Kuo-Han Hung, Pang-Chi Lo, Jia-Fong Yeh et al.
Efficient Exploration for LLMs
Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao et al.
HarmonyDream: Task Harmonization Inside World Models
Haoyu Ma, Jialong Wu, Ningya Feng et al.
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
JoonHo Lee, Jae Oh Woo, Juree Seok et al.
Stealthy Imitation: Reward-guided Environment-free Policy Stealing
Zhixiong Zhuang, Irina Nicolae, Mario Fritz
Token-level Direct Preference Optimization
Yongcheng Zeng, Guoqing Liu, Weiyu Ma et al.