ICML Poster "reinforcement learning from human feedback" Papers

25 papers found

Active Preference Learning for Large Language Models

William Muldrew, Peter Hayes, Mingtian Zhang et al.

ICML 2024posterarXiv:2402.08114

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

Gokul Swamy, Christoph Dann, Rahul Kidambi et al.

ICML 2024posterarXiv:2401.04056

BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback

Gaurav Pandey, Yatin Nandwani, Tahira Naseem et al.

ICML 2024posterarXiv:2402.02479

Dense Reward for Free in Reinforcement Learning from Human Feedback

Alexander Chan, Hao Sun, Samuel Holt et al.

ICML 2024posterarXiv:2402.00782

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

Yihan Du, Anna Winnicki, Gal Dalal et al.

ICML 2024posterarXiv:2402.10342

Exploring the LLM Journey from Cognition to Expression with Linear Representations

Yuzi Yan, Jialian Li, YipinZhang et al.

ICML 2024posterarXiv:2405.16964

Fundamental Limitations of Alignment in Large Language Models

Yotam Wolf, Noam Wies, Oshri Avnery et al.

ICML 2024posterarXiv:2304.11082

How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?

Ryan Liu, Theodore R Sumers, Ishita Dasgupta et al.

ICML 2024posterarXiv:2402.07282

Human Alignment of Large Language Models through Online Preference Optimisation

Daniele Calandriello, Zhaohan Guo, REMI MUNOS et al.

ICML 2024posterarXiv:2403.08635

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Shusheng Xu, Wei Fu, Jiaxuan Gao et al.

ICML 2024posterarXiv:2404.10719

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Banghua Zhu, Michael Jordan, Jiantao Jiao

ICML 2024posterarXiv:2401.16335

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint

Wei Xiong, Hanze Dong, Chenlu Ye et al.

ICML 2024posterarXiv:2312.11456

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

songyang gao, Qiming Ge, Wei Shen et al.

ICML 2024posterarXiv:2401.11458

MaxMin-RLHF: Alignment with Diverse Human Preferences

Souradip Chakraborty, Jiahao Qiu, Hui Yuan et al.

ICML 2024posterarXiv:2402.08925

MusicRL: Aligning Music Generation to Human Preferences

Geoffrey Cideron, Sertan Girgin, Mauro Verzetti et al.

ICML 2024posterarXiv:2301.11325

ODIN: Disentangled Reward Mitigates Hacking in RLHF

Lichang Chen, Chen Zhu, Jiuhai Chen et al.

ICML 2024posterarXiv:2402.07319

Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback

Vincent Conitzer, Rachel Freedman, Jobstq Heitzig et al.

ICML 2024poster

Privacy-Preserving Instructions for Aligning Large Language Models

Da Yu, Peter Kairouz, Sewoong Oh et al.

ICML 2024posterarXiv:2402.13659

Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization

Li Ding, Jenny Zhang, Jeff Clune et al.

ICML 2024posterarXiv:2310.12103

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

Ziniu Li, Tian Xu, Yushun Zhang et al.

ICML 2024posterarXiv:2310.10505

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban et al.

ICML 2024posterarXiv:2403.01857

RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Harrison Lee, Samrat Phatale, Hassan Mansoor et al.

ICML 2024posterarXiv:2309.00267

RLVF: Learning from Verbal Feedback without Overgeneralization

Moritz Stephan, Alexander Khazatsky, Eric Mitchell et al.

ICML 2024posterarXiv:2402.10893

WARM: On the Benefits of Weight Averaged Reward Models

Alexandre Rame, Nino Vieillard, Léonard Hussenot et al.

ICML 2024poster

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Collin Burns, Pavel Izmailov, Jan Kirchner et al.

ICML 2024posterarXiv:2312.09390