by Florian Eddie Dorner Papers
2 papers found
Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data
Florian Eddie Dorner, Vivian Nastl, Moritz Hardt
ICLR 2025poster
23
citations
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo, Florian Eddie Dorner, Moritz Hardt
ICLR 2025posterarXiv:2407.07890