Most Cited 2025 by Tomek Korbak Papers
4 papers found
Conference
#1
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Javier Rando, Tony Wang, Stewart Slocum et al.
ICLR 2025posterarXiv:2307.15217
733
citations
#2
Inverse Scaling: When Bigger Isn't Better
Joe Cavanagh, Andrew Gritsevskiy, Najoung Kim et al.
ICLR 2025posterarXiv:2306.09479
180
citations
#3
Looking Inward: Language Models Can Learn About Themselves by Introspection
Felix Jedidja Binder, James Chua, Tomek Korbak et al.
ICLR 2025oralarXiv:2410.13787
40
citations
#4
Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs
Xander Davies, Eric Winsor, Alexandra Souly et al.
NEURIPS 2025posterarXiv:2502.14828