"evaluation framework" Papers
4 papers found
Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness
Rongzhe Wei, Peizhi Niu, Hans Hao-Hsun Hsu et al.
NeurIPS 2025posterarXiv:2506.05735
6
citations
JudgeBench: A Benchmark for Evaluating LLM-Based Judges
Sijun Tan, Siyuan Zhuang, Kyle Montgomery et al.
ICLR 2025posterarXiv:2410.12784
150
citations
Assessing Large Language Models on Climate Information
Jannis Bulian, Mike Schäfer, Afra Amini et al.
ICML 2024poster
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika, Long Phan, Xuwang Yin et al.
ICML 2024poster