Paper "automatic evaluation" Papers
3 papers found
Conference
Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions
Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al.
AAAI 2025paperarXiv:2408.08781
37
citations
M-Prometheus: A Suite of Open Multilingual LLM Judges
José Pombal, Dongkeun Yoon, Patrick Fernandes et al.
COLM 2025paperarXiv:2504.04953
23
citations
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José Pombal, Nuno M Guerreiro, Ricardo Rei et al.
COLM 2025paperarXiv:2504.01001
8
citations