by Dan Vann Papers
2 papers found
Comparison requires valid measurement: Rethinking attack success rate comparisons in AI red teaming
Alex Chouldechova, A. Feder Cooper, Solon Barocas et al.
NEURIPS 2025posterarXiv:2601.18076
1
citations
Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge
Hanna Wallach, Meera Desai, A. Feder Cooper et al.
ICML 2025posterarXiv:2502.00561