"language model benchmarking" Papers
3 papers found
Predicting Empirical AI Research Outcomes with Language Models
Jiaxin Wen, Chenglei Si, Yueh-Han Chen et al.
NeurIPS 2025posterarXiv:2506.00794
5
citations
RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics
Jie Zhang, Cezara Petrui, Kristina Nikolić et al.
NeurIPS 2025posterarXiv:2505.12575
10
citations
SWEb: A Large Web Dataset for the Scandinavian Languages
Tobias Norlund, Tim Isbister, Amaru Cuba Gyllensten et al.
ICLR 2025posterarXiv:2410.04456
1
citations