PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment

9citations

Project

Citations

#798

in ICLR 2025

of 3827 papers

Authors

Data Points

Authors

Daiwei Chen Yi Chen Aniket Rege Zhi Wang Ramya Vinayak

Topics

reward modeling pluralistic alignment personalized preferences few-shot learning text-to-text tasks text-to-image tasks sample efficiency

Abstract

Foundation models trained on internet-scale data benefit from extensive alignment to human preferences before deployment. However, existing methods typically assume a homogeneous preference shared by all individuals, overlooking the diversity inherent in human values. In this work, we propose a general reward modeling framework for pluralistic alignment (PAL), which incorporates diverse preferences from the ground up. PAL has a modular design that leverages commonalities across users while catering to individual personalization, enabling efficient few-shot localization of preferences for new users. Extensive empirical evaluation demonstrates that PAL matches or outperforms state-of-the-art methods on both text-to-text and text-to-image tasks: on Reddit TL;DR Summary, PAL is 1.7% more accurate for seen users and 36% more accurate for unseen users compared to the previous best method, with 100× less parameters. On Pick-a-Pic v2, PAL is 2.5% more accurate than the best method with 156× fewer learned parameters. Finally, we provide theoretical analysis for generalization of rewards learned via PAL framework showcasing the reduction in number of samples needed per user.

Citation History

Jan 25, 2026

Jan 27, 2026

Jan 30, 2026

9+9