Poster "jailbreak attacks" Papers
11 papers found
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing
Keltin Grimes, Marco Christiani, David Shriver et al.
ICLR 2025posterarXiv:2412.13341
6
citations
CoP: Agentic Red-teaming for Large Language Models using Composition of Principles
Chen Xiong, Pin-Yu Chen, Tsung-Yi Ho
NeurIPS 2025posterarXiv:2506.00781
3
citations
Durable Quantization Conditioned Misalignment Attack on Large Language Models
Peiran Dong, Haowei Li, Song Guo
ICLR 2025poster
1
citations
EFFICIENT JAILBREAK ATTACK SEQUENCES ON LARGE LANGUAGE MODELS VIA MULTI-ARMED BANDIT-BASED CONTEXT SWITCHING
Aditya Ramesh, Shivam Bhardwaj, Aditya Saibewar et al.
ICLR 2025poster
3
citations
Endless Jailbreaks with Bijection Learning
Brian R.Y. Huang, Max Li, Leonard Tang
ICLR 2025posterarXiv:2410.01294
14
citations
Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models
Ma Teng, Xiaojun Jia, Ranjie Duan et al.
ICCV 2025posterarXiv:2412.05934
21
citations
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
Ruofan Wang, Juncheng Li, Yixu Wang et al.
ICCV 2025posterarXiv:2411.00827
8
citations
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Shiji Zhao, Ranjie Duan, Fengxiang Wang et al.
ICCV 2025posterarXiv:2501.04931
28
citations
Reasoning as an Adaptive Defense for Safety
Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.
NeurIPS 2025posterarXiv:2507.00971
9
citations
T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks
Jiayang Liu, Siyuan Liang, Shiqian Zhao et al.
NeurIPS 2025posterarXiv:2505.06679
6
citations
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu, Xiaosen Zheng, Tianyu Pang et al.
ICML 2024poster