Adversarial Attacks
Crafting adversarial examples
Related Topics (Robustness)
Top Papers
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko, francesco croce, Nicolas Flammarion
Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?
Yu-Lin Tsai, Chia-Yi Hsu, Chulin Xie et al.
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian et al.
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Anselm Paulus, Arman Zharmagambetov, Chuan Guo et al.
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley, Daniel Tan, Niels Warncke et al.
EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE
Zeyi Liao, Lingbo Mo, Chejian Xu et al.
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
Hanrong Zhang, Jingyuan Huang, Kai Mei et al.
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Xiaogeng Liu, Peiran Li, G. Edward Suh et al.
Rethinking Model Ensemble in Transfer-based Adversarial Attacks
Huanran Chen, Yichi Zhang, Yinpeng Dong et al.
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
Yifan Li, hangyu guo, Kun Zhou et al.
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang et al.
Dissecting Adversarial Robustness of Multimodal LM Agents
Chen Wu, Rishi Shah, Jing Yu Koh et al.
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
Zhipei Xu, Xuanyu Zhang, Runyi Li et al.
Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks
Mehrdad Saberi, Vinu Sankar Sadasivan, Keivan Rezaei et al.
BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
Jiawang Bai, Kuofeng Gao, Shaobo Min et al.
DAP: A Dynamic Adversarial Patch for Evading Person Detectors
Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif et al.
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning
Seanie Lee, Minsu Kim, Lynn Cherif et al.
PAD: Patch-Agnostic Defense against Adversarial Patch Attacks
Lihua Jing, Rui Wang, Wenqi Ren et al.
MathAttack: Attacking Large Language Models towards Math Solving Ability
Zihao Zhou, Qiufeng Wang, Mingyu Jin et al.
DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection
Yuhao Sun, Lingyun Yu, Hongtao Xie et al.
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI
Robert Hönig, Javier Rando, Nicholas Carlini et al.
Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models
Peifei Zhu, Tsubasa Takahashi, Hirokatsu Kataoka
Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection
Yuzhen Lin, Wentang Song, Bin Li et al.
Persistent Pre-training Poisoning of LLMs
Yiming Zhang, Javier Rando, Ivan Evtimov et al.
Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model
Decheng Liu, Xijun Wang, Chunlei Peng et al.
Adversarial Prompt Tuning for Vision-Language Models
Jiaming Zhang, Xingjun Ma, Xin Wang et al.
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
Sensen Gao, Xiaojun Jia, Xuhong Ren et al.
Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation
Siwei Wen, junyan ye, Peilin Feng et al.
Adversarial Search Engine Optimization for Large Language Models
Fredrik Nestaas, Edoardo Debenedetti, Florian Tramer
Model Poisoning Attacks to Federated Learning via Multi-Round Consistency
Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong
ADBM: Adversarial Diffusion Bridge Model for Reliable Adversarial Purification
Xiao Li, Wenxuan Sun, Huanran Chen et al.
AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models
Mintong Kang, Chejian Xu, Bo Li
A Transfer Attack to Image Watermarks
Yuepeng Hu, Zhengyuan Jiang, Moyang Guo et al.
Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples
chengqian gao, Haonan Li, Liu Liu et al.
Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks
Junying Wang, Hongyuan Zhang, Yuan Yuan
Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models
Andreas Müller, Denis Lukovnikov, Jonas Thietke et al.
EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models
Ruoxi Chen, Haibo Jin, Yixin Liu et al.
Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive
Yumeng Li, Margret Keuper, Dan Zhang et al.
Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement
Han Wu, Guanyan Ou, Weibin Wu et al.
Towards Faithful XAI Evaluation via Generalization-Limited Backdoor Watermark
Mengxi Ya, Yiming Li, Tao Dai et al.
Revisiting Adversarial Training Under Long-Tailed Distributions
Xinli Yue, Ningping Mou, Qian Wang et al.
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Yunhan Zhao, Xiang Zheng, Lin Luo et al.
Memory Injection Attacks on LLM Agents via Query-Only Interaction
Shen Dong, Shaochen Xu, Pengfei He et al.
Stable Unlearnable Example: Enhancing the Robustness of Unlearnable Examples via Stable Error-Minimizing Noise
Yixin Liu, Kaidi Xu, Xun Chen et al.
Understanding and Enhancing the Transferability of Jailbreaking Attacks
Runqi Lin, Bo Han, Fengwang Li et al.
Progressive Poisoned Data Isolation for Training-Time Backdoor Defense
Yiming Chen, Haiwei Wu, Jiantao Zhou
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
Andy Zhou, Kevin Wu, Francesco Pinto et al.
The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense
Yangyang Guo, Fangkai Jiao, Liqiang Nie et al.
Adversarial Training Should Be Cast as a Non-Zero-Sum Game
Alex Robey, Fabian Latorre, George Pappas et al.
Security Attacks on LLM-based Code Completion Tools
Wen Cheng, Ke Sun, Xinyu Zhang et al.
IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation
Yiren Song, Pei Yang, Hai Ci et al.
The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions
Stefan Sylvius Wagner, Maike Behrendt, Marc Ziegele et al.
Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving
Mozhgan Pourkeshavarz, Mohammad Sabokrou, Amir Rasouli
Robust Distillation via Untargeted and Targeted Intermediate Adversarial Samples
Junhao Dong, Piotr Koniusz, Junxi Chen et al.
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment
Xiaojun Jia, Sensen Gao, Simeng Qin et al.
DiffFAS: Face Anti-Spoofing via Generative Diffusion Models
Xinxu Ge, Xin Liu, Zitong Yu et al.
STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models
Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer et al.
CL-Attack: Textual Backdoor Attacks via Cross-Lingual Triggers
Jingyi Zheng, Tianyi Hu, Tianshuo Cong et al.
Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM
Linyu Tang, Lei Zhang
Adversarial Attacks on the Interpretation of Neuron Activation Maximization
Géraldin Nanfack, Alexander Fulleringer, Jonathan Marty et al.
Boosting Adversarial Training via Fisher-Rao Norm-based Regularization
Xiangyu Yin, Wenjie Ruan
RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors
Fengshuo Bai, Runze Liu, Yali Du et al.
CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
Nay Myat Min, Long H. Pham, Yige Li et al.
BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization
Xueyang Zhou, Guiyao Tie, Guowen Zhang et al.
Instant Adversarial Purification with Adversarial Consistency Distillation
Chun Tong Lei, Hon Ming Yam, Zhongliang Guo et al.
Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios
Ziqiang Li, Hong Sun, Pengfei Xia et al.
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
Guobin Shen, Dongcheng Zhao, Yiting Dong et al.
Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in Video
Zhaobo Qi, Yibo Yuan, Xiaowen Ruan et al.
Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety
Zihan Guan, Mengxuan Hu, Ronghang Zhu et al.
PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor
Jaewon Jung, Hongsun Jang, Jaeyong Song et al.
ProSec: Fortifying Code LLMs with Proactive Security Alignment
Xiangzhe Xu, Zian Su, Jinyao Guo et al.
Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection
Jiahao Xu, Zikai Zhang, Rui Hu
Hiding Imperceptible Noise in Curvature-Aware Patches for 3D Point Cloud Attack
Mingyu Yang, Daizong Liu, Keke Tang et al.
Backdoor Contrastive Learning via Bi-level Trigger Optimization
Weiyu Sun, Xinyu Zhang, Hao LU et al.
AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Sicheng Zhu, Brandon Amos, Yuandong Tian et al.
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Jin Wang, Chenghui Lv, Xian Li et al.
Demystifying Poisoning Backdoor Attacks from a Statistical Perspective
Ganghua Wang, Xun Xian, Ashish Kundu et al.
Backdoor Attacks Against No-Reference Image Quality Assessment Models via a Scalable Trigger
Yi Yu, Song Xia, Xun Lin et al.
BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems
Andy Zhang, Joey Ji, Celeste Menders et al.
Exploring Vulnerabilities in Spiking Neural Networks: Direct Adversarial Attacks on Raw Event Data
Yanmeng Yao, Xiaohan Zhao, Bin Gu
Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in RL
Xiangyu Liu, Souradip Chakraborty, Yanchao Sun et al.
Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
Jiaming Zhang, Junhong Ye, Xingjun Ma et al.
ADBA: Approximation Decision Boundary Approach for Black-Box Adversarial Attacks
Feiyang Wang, Xingquan Zuo, Hai Huang et al.
Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks
Peng Xie, Yequan Bie, Jianda Mao et al.
Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective
Zhaoxin Wang, Handing Wang, Cong Tian et al.
Bottom-Up Domain Prompt Tuning for Generalized Face Anti-Spoofing
SI-QI LIU, Qirui Wang, Pong Chi Yuen
Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks
Danni Yuan, Mingda Zhang, Shaokui Wei et al.
AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses
Nicholas Carlini, Edoardo Debenedetti, Javier Rando et al.
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
Hanhui Wang, Yihua Zhang, Ruizheng Bai et al.
Data Taggants: Dataset Ownership Verification Via Harmless Targeted Data Poisoning
Wassim Bouaziz, Nicolas Usunier, El-Mahdi El-Mhamdi
IPRemover: A Generative Model Inversion Attack against Deep Neural Network Fingerprinting and Watermarking
Wei Zong, Yang-Wai Chow, Willy Susilo et al.
LoRID: Low-Rank Iterative Diffusion for Adversarial Purification
Geigh Zollicoffer, Minh N. Vu, Ben Nebgen et al.
MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification
Jimin Park, AHyun Ji, Minji Park et al.
Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion
Honglei Miao, Fan Ma, Ruijie Quan et al.
Enhancing Adversarial Transferability with Adversarial Weight Tuning
Jiahao Chen, Zhou Feng, Rui Zeng et al.
Injecting Universal Jailbreak Backdoors into LLMs in Minutes
Zhuowei Chen, qiannan zhang, Shichao Pei
Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Satvik Golechha, Adrià Garriga-Alonso
Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection
Youheng Sun, Shengming Yuan, Xuanhan Wang et al.
Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks
Yiyi Chen, Russa Biswas, Heather Lent et al.
On the Robustness of Neural-Enhanced Video Streaming against Adversarial Attacks
Qihua Zhou, Jingcai Guo, Song Guo et al.