"model vulnerabilities" Papers
3 papers found
EFFICIENT JAILBREAK ATTACK SEQUENCES ON LARGE LANGUAGE MODELS VIA MULTI-ARMED BANDIT-BASED CONTEXT SWITCHING
Aditya Ramesh, Shivam Bhardwaj, Aditya Saibewar et al.
ICLR 2025poster
3
citations
Memories of Forgotten Concepts
Matan Rusanovsky, Shimon Malnick, Amir Jevnisek et al.
CVPR 2025highlightarXiv:2412.00782
4
citations
VERA: Variational Inference Framework for Jailbreaking Large Language Models
Anamika Lochab, Lu Yan, Patrick Pynadath et al.
NeurIPS 2025posterarXiv:2506.22666