"benchmarking" Papers
3 papers found
Conference
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Apoorv Khandelwal, Tian Yun, Nihal V. Nayak et al.
COLM 2025paperarXiv:2410.23261
6
citations
Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning
Aleksander Ficek, Somshubra Majumdar, Vahid Noroozi et al.
COLM 2025paperarXiv:2502.13820
5
citations
Yourbench: Dynamic Evaluation Set Generation with LLMs
Sumuk Shashidhar, Clémentine Fourrier, Alina Lozovskaya et al.
COLM 2025paper