ICLR 2025 "model alignment" Papers
2 papers found
Large Language Models Assume People are More Rational than We Really are
Ryan Liu, Jiayi Geng, Joshua Peterson et al.
ICLR 2025posterarXiv:2406.17055
37
citations
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.
ICLR 2025posterarXiv:2410.18640
14
citations