"policy alignment" Papers
3 papers found
AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories
Yi Zeng, Yu Yang, Andy Zhou et al.
ICLR 2025poster
Direct Alignment with Heterogeneous Preferences
Ali Shirali, Arash Nasr-Esfahany, Abdullah Alomar et al.
NeurIPS 2025posterarXiv:2502.16320
8
citations
Pairwise Calibrated Rewards for Pluralistic Alignment
Daniel Halpern, Evi Micha, Ariel Procaccia et al.
NeurIPS 2025posterarXiv:2506.06298