NeurIPS "large language model evaluation" Papers
2 papers found
BenchmarkCards: Standardized Documentation for Large Language Model Benchmarks
Anna Sokol, Elizabeth Daly, Michael Hind et al.
NeurIPS 2025posterarXiv:2410.12974
2
citations
How Benchmark Prediction from Fewer Data Misses the Mark
Guanhua Zhang, Florian E. Dorner, Moritz Hardt
NeurIPS 2025posterarXiv:2506.07673
4
citations