Position: A Roadmap to Pluralistic Alignment

0citations
PDF
0
Citations
#10
in ICML 2024
of 2635 papers
12
Authors
1
Data Points

Abstract

With increased power and prevalence of AI systems, it is ever more critical that AI systems are designed to serveall, i.e., people with diverse values and perspectives. However, aligning models to servepluralistichuman values remains an open research question. In this piece, we propose a roadmap to pluralistic alignment, specifically using large language models as a test bed. We identify and formalize three possible ways to define and operationalize pluralism in AI systems: 1)Overton pluralisticmodels that present a spectrum of reasonable responses; 2)Steerably pluralisticmodels that can steer to reflect certain perspectives; and 3)Distributionally pluralisticmodels that are well-calibrated to a given population in distribution. We also formalize and discuss three possible classes ofpluralistic benchmarks: 1)Multi-objectivebenchmarks, 2)Trade-off steerablebenchmarks that incentivize models to steer to arbitrary trade-offs, and 3)Jury-pluralisticbenchmarks that explicitly model diverse human ratings. We use this framework to argue that current alignment techniques may be fundamentally limited for pluralistic AI; indeed, we highlight empirical evidence, both from our own experiments and from other work, that standard alignment procedures mightreducedistributional pluralism in models, motivating the need for further research on pluralistic alignment.

Citation History

Jan 28, 2026
0