2025 "jailbreaking attacks" Papers
3 papers found
$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
Mintong Kang, Bo Li
ICLR 2025posterarXiv:2407.05557
34
citations
Attention! Your Vision Language Model Could Be Maliciously Manipulated
Xiaosen Wang, Shaokang Wang, Zhijin Ge et al.
NeurIPS 2025posterarXiv:2505.19911
3
citations
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee, Haebin Seong, Dong Bok Lee et al.
ICLR 2025posterarXiv:2410.01524
13
citations