Poster "llm evaluation" Papers
6 papers found
Conference
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu, Xinggang Wang, Xinlong Wang
ICLR 2025posterarXiv:2310.17631
260
citations
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
Jasper Dekoninck, Maximilian Baader, Martin Vechev
ICLR 2025posterarXiv:2409.00696
3
citations
Probing Hidden Knowledge Holes in Unlearned LLMs
Myeongseob Ko, Hoang Anh Just, Charles Fleming et al.
NEURIPS 2025poster
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Xeron Du, Yifan Yao, Kaijing Ma et al.
NEURIPS 2025posterarXiv:2502.14739
118
citations
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties
Jiyoung Lee, Seungho Kim, Jieun Han et al.
NEURIPS 2025posterarXiv:2505.20875
3
citations
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang, Lianmin Zheng, Ying Sheng et al.
ICML 2024posterarXiv:2403.04132