ICLR 2025 "automated evaluation framework" Papers
2 papers found
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Jiayi Ye, Yanbo Wang, Yue Huang et al.
ICLR 2025posterarXiv:2410.02736
207
citations
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu et al.
ICLR 2025posterarXiv:2406.04770
142
citations