Poster "trigger detection" Papers
3 papers found
DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders
Sizai Hou, Songze Li, Duanyi Yao
CVPR 2025posterarXiv:2411.16154
Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models
Biao Yi, Tiansheng Huang, Sishuo Chen et al.
ICLR 2025posterarXiv:2506.16447
21
citations
Causality Based Front-door Defense Against Backdoor Attack on Language Models
Yiran Liu, Xiaoang Xu, Zhiyi Hou et al.
ICML 2024poster