NeurIPS Spotlight "language model evaluation" Papers
2 papers found
Absence Bench: Language Models Can’t See What’s Missing
Harvey Yiyun Fu, Aryan Shrivastava, Jared Moore et al.
NeurIPS 2025spotlight
Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation
David Heineman, Valentin Hofmann, Ian Magnusson et al.
NeurIPS 2025spotlightarXiv:2508.13144
4
citations