RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

17citations

Citations

Authors

Data Points

Authors

Hanyang Zhao Genta Winata Anirban Das Shi-Xiong Zhang David Yao Wenpin Tang Sambit Sahu

Abstract

Recently, numerous preference optimization algorithms have been introduced as extensions to the Direct Preference Optimization (DPO) family. While these methods have successfully aligned models with human preferences, there is a lack of understanding regarding the contributions of their additional components. Moreover, fair and consistent comparisons are scarce, making it difficult to discern which components genuinely enhance downstream performance. In this work, we propose RainbowPO, a unified framework that demystifies the effectiveness of existing DPO methods by categorizing their key components into seven broad directions. We integrate these components into a single cohesive objective, enhancing the performance of each individual element. Through extensive experiments, we demonstrate that RainbowPO outperforms existing DPO variants. Additionally, we provide insights to guide researchers in developing new DPO methods and assist practitioners in their implementations.

Citation History

Jan 26, 2026