Most Cited 2024 Poster by Tomek Korbak Papers
3 papers found
Conference
#1
Towards Understanding Sycophancy in Language Models
Mrinank Sharma, Meg Tong, Tomek Korbak et al.
ICLR 2024arXiv:2310.13548
526
citations
#2
Compositional Preference Models for Aligning LMs
DONGYOUNG GO, Tomek Korbak, Germàn Kruszewski et al.
ICLR 2024arXiv:2310.13011
25
citations
#3
The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”
Lukas Berglund, Meg Tong, Maximilian Kaufmann et al.
ICLR 2024