2024 "language model alignment" Papers

13 papers found

BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback

Gaurav Pandey, Yatin Nandwani, Tahira Naseem et al.

ICML 2024poster

Controlled Decoding from Language Models

Sidharth Mudgal, Jong Lee, Harish Ganapathy et al.

ICML 2024poster

Customizing Language Model Responses with Contrastive In-Context Learning

Xiang Gao, Kamalika Das

AAAI 2024paperarXiv:2401.17390
19
citations

Decoding-time Realignment of Language Models

Tianlin Liu, Shangmin Guo, Leonardo Martins Bianco et al.

ICML 2024spotlight

Degeneration-free Policy Optimization: RL Fine-Tuning for Language Models without Degeneration

Youngsoo Jang, Geon-Hyeong Kim, Byoungjip Kim et al.

ICML 2024poster

Human Alignment of Large Language Models through Online Preference Optimisation

Daniele Calandriello, Zhaohan Guo, REMI MUNOS et al.

ICML 2024poster

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Banghua Zhu, Michael Jordan, Jiantao Jiao

ICML 2024poster

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

songyang gao, Qiming Ge, Wei Shen et al.

ICML 2024poster

MaxMin-RLHF: Alignment with Diverse Human Preferences

Souradip Chakraborty, Jiahao Qiu, Hui Yuan et al.

ICML 2024poster

ODIN: Disentangled Reward Mitigates Hacking in RLHF

Lichang Chen, Chen Zhu, Jiuhai Chen et al.

ICML 2024poster

Provably Robust DPO: Aligning Language Models with Noisy Feedback

Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan

ICML 2024poster

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Zixiang Chen, Yihe Deng, Huizhuo Yuan et al.

ICML 2024poster

Towards Efficient Exact Optimization of Language Model Alignment

Haozhe Ji, Cheng Lu, Yilin Niu et al.

ICML 2024poster