Poster "safety assessment" Papers
2 papers found
h4rm3l: A Language for Composable Jailbreak Attack Synthesis
Moussa Koulako Bala Doumbouya, Ananjan Nandi, Gabriel Poesia et al.
ICLR 2025posterarXiv:2408.04811
11
citations
Measuring what Matters: Construct Validity in Large Language Model Benchmarks
Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou et al.
NEURIPS 2025posterarXiv:2511.04703
8
citations