Poster "black-box defense" Papers
2 papers found
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Yunhan Zhao, Xiang Zheng, Lin Luo et al.
ICLR 2025posterarXiv:2410.20971
17
citations
Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models
Biao Yi, Tiansheng Huang, Sishuo Chen et al.
ICLR 2025posterarXiv:2506.16447
21
citations