2025 Poster "model vulnerabilities" Papers
2 papers found
EFFICIENT JAILBREAK ATTACK SEQUENCES ON LARGE LANGUAGE MODELS VIA MULTI-ARMED BANDIT-BASED CONTEXT SWITCHING
Aditya Ramesh, Shivam Bhardwaj, Aditya Saibewar et al.
ICLR 2025poster
3
citations
VERA: Variational Inference Framework for Jailbreaking Large Language Models
Anamika Lochab, Lu Yan, Patrick Pynadath et al.
NeurIPS 2025posterarXiv:2506.22666