Poster "safety vulnerabilities" Papers
3 papers found
Endless Jailbreaks with Bijection Learning
Brian R.Y. Huang, Max Li, Leonard Tang
ICLR 2025posterarXiv:2410.01294
14
citations
Reasoning as an Adaptive Defense for Safety
Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.
NeurIPS 2025posterarXiv:2507.00971
9
citations
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu, Xiaosen Zheng, Tianyu Pang et al.
ICML 2024poster