"llm evaluation" Papers
3 papers found
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
Jasper Dekoninck, Maximilian Baader, Martin Vechev
ICLR 2025posterarXiv:2409.00696
3
citations
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Xeron Du, Yifan Yao, Kaijing Ma et al.
NeurIPS 2025posterarXiv:2502.14739
118
citations
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang, Lianmin Zheng, Ying Sheng et al.
ICML 2024poster