ICLR Poster "jailbreaking attacks" Papers
4 papers found
$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
Mintong Kang, Bo Li
ICLR 2025posterarXiv:2407.05557
34
citations
BadRobot: Jailbreaking Embodied LLM Agents in the Physical World
Hangtao Zhang, Chenyu Zhu, Xianlong Wang et al.
ICLR 2025poster
6
citations
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee, Haebin Seong, Dong Bok Lee et al.
ICLR 2025posterarXiv:2410.01524
13
citations
Understanding and Enhancing the Transferability of Jailbreaking Attacks
Runqi Lin, Bo Han, Fengwang Li et al.
ICLR 2025posterarXiv:2502.03052
16
citations