On a Connection Between Imitation Learning and RLHF

13citations

arXiv:2503.05079

Citations

#715

in ICLR 2025

of 3827 papers

Authors

Data Points

Authors

Teng Xiao Yige Yuan Mingxiao Li Zhengyu Chen Vasant Honavar

Topics

imitation learning reinforcement learning from human feedback preference data alignment large language model alignment theoretical connection direct imitation learning alignment algorithms

Abstract

This work studies the alignment of large language models with preference data from an imitation learning perspective. We establish a close theoretical connection between reinforcement learning from human feedback RLHF and imitation learning (IL), revealing that RLHF implicitly performs imitation learning on the preference data distribution. Building on this connection, we propose DIL, a principled framework that directly optimizes the imitation learning objective. DIL provides a unified imitation learning perspective on alignment, encompassing existing alignment algorithms as special cases while naturally introducing new variants. By bridging IL and RLHF, DIL offers new insights into alignment with RLHF. Extensive experiments demonstrate that DIL outperforms existing methods on various challenging benchmarks.

Citation History

Jan 26, 2026

Jan 27, 2026

Feb 1, 2026

13+13