"model evaluation" Papers
5 papers found
Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function
Keyon Vafa, Ashesh Rambachan, Sendhil Mullainathan
ICML 2024poster
Feedback Loops With Language Models Drive In-Context Reward Hacking
Alexander Pan, Erik Jones, Meena Jagadeesan et al.
ICML 2024poster
Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks
Guanhua Zhang, Moritz Hardt
ICML 2024oral
Interplay of ROC and Precision-Recall AUCs: Theoretical Limits and Practical Implications in Binary Classification
Martin Mihelich, François Castagnos, Charles Dognin
ICML 2024poster
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension
Fangyun Wei, Xi Chen, Lin Luo
ICML 2024poster