by Thomas Foster Papers
3 papers found
LILO: Learning to Reason at the Frontier of Learnability
Thomas Foster, Anya Sims, Johannes Forkel et al.
NeurIPS 2025poster
Measuring what Matters: Construct Validity in Large Language Model Benchmarks
Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou et al.
NeurIPS 2025poster
The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements
Bingchen Zhao, Despoina Magka, Minqi Jiang et al.
NeurIPS 2025poster
2
citations