NEURIPS "language model benchmarking" Papers
2 papers found
Predicting Empirical AI Research Outcomes with Language Models
Jiaxin Wen, Chenglei Si, Yueh-Han Chen et al.
NEURIPS 2025posterarXiv:2506.00794
5
citations
RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics
Jie Zhang, Cezara Petrui, Kristina Nikolić et al.
NEURIPS 2025posterarXiv:2505.12575
10
citations