"adversarial prompting" Papers
3 papers found
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko, francesco croce, Nicolas Flammarion
ICLR 2025posterarXiv:2404.02151
387
citations
ProAdvPrompter: A Two-Stage Journey to Effective Adversarial Prompting for LLMs
Hao Di, Tong He, Haishan Ye et al.
ICLR 2025poster
2
citations
The Right to Red-Team: Adversarial AI Literacy as a Civic Imperative in K-12 Education
Devan Walton, Haesol Bae
NEURIPS 2025poster