2024 by Tomek Korbak Papers
3 papers found
Compositional Preference Models for Aligning LMs
DONGYOUNG GO, Tomek Korbak, Germàn Kruszewski et al.
ICLR 2024posterarXiv:2310.13011
The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”
Lukas Berglund, Meg Tong, Maximilian Kaufmann et al.
ICLR 2024poster
Towards Understanding Sycophancy in Language Models
Mrinank Sharma, Meg Tong, Tomek Korbak et al.
ICLR 2024posterarXiv:2310.13548