Most Cited 2024 Highlight by Tomek Korbak Papers
3 papers found
Conference
#1
The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”
Lukas Berglund, Meg Tong, Maximilian Kaufmann et al.
ICLR 2024poster
#2
Towards Understanding Sycophancy in Language Models
Mrinank Sharma, Meg Tong, Tomek Korbak et al.
ICLR 2024posterarXiv:2310.13548
#3
Compositional Preference Models for Aligning LMs
DONGYOUNG GO, Tomek Korbak, Germàn Kruszewski et al.
ICLR 2024posterarXiv:2310.13011