"white-box attacks" Papers
3 papers found
Reasoning as an Adaptive Defense for Safety
Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.
NeurIPS 2025posterarXiv:2507.00971
9
citations
Robustness Tokens: Towards Adversarial Robustness of Transformers
Brian Pulfer, Yury Belousov, Slava Voloshynovskiy
ECCV 2024posterarXiv:2503.10191
Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual
Ruichu Cai, Yuxuan Zhu, Jie Qiao et al.
AAAI 2024paperarXiv:2312.13628
5
citations