ICLR Poster "automated evaluation framework" Papers
3 papers found
ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models
Veeramakali Vignesh Manivannan, Yasaman Jafari, Srikar Eranky et al.
ICLR 2025posterarXiv:2410.16701
3
citations
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Jiayi Ye, Yanbo Wang, Yue Huang et al.
ICLR 2025posterarXiv:2410.02736
207
citations
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu et al.
ICLR 2025posterarXiv:2406.04770
142
citations