2025 "weak-to-strong generalization" Papers
2 papers found
Weak-to-Strong Generalization Through the Data-Centric Lens
Changho Shin, John Cooper, Frederic Sala
ICLR 2025posterarXiv:2412.03881
14
citations
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.
ICLR 2025posterarXiv:2410.18640
14
citations