Adversarial Attacks
Crafting adversarial examples
Related Topics (Robustness)
Top Papers
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko, francesco croce, Nicolas Flammarion
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Anselm Paulus, Arman Zharmagambetov, Chuan Guo et al.
Rethinking Model Ensemble in Transfer-based Adversarial Attacks
Huanran Chen, Yichi Zhang, Yinpeng Dong et al.
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang et al.
Dissecting Adversarial Robustness of Multimodal LM Agents
Chen Wu, Rishi Shah, Jing Yu Koh et al.
Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks
Mehrdad Saberi, Vinu Sankar Sadasivan, Keivan Rezaei et al.
DAP: A Dynamic Adversarial Patch for Evading Person Detectors
Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif et al.
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning
Seanie Lee, Minsu Kim, Lynn Cherif et al.
PAD: Patch-Agnostic Defense against Adversarial Patch Attacks
Lihua Jing, Rui Wang, Wenqi Ren et al.
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI
Robert HΓΆnig, Javier Rando, Nicholas Carlini et al.
Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model
Decheng Liu, Xijun Wang, Chunlei Peng et al.
Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models
Peifei Zhu, Tsubasa Takahashi, Hirokatsu Kataoka
Adversarial Search Engine Optimization for Large Language Models
Fredrik Nestaas, Edoardo Debenedetti, Florian Tramer
ADBM: Adversarial Diffusion Bridge Model for Reliable Adversarial Purification
Xiao Li, Wenxuan Sun, Huanran Chen et al.
Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models
Andreas MΓΌller, Denis Lukovnikov, Jonas Thietke et al.
Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks
Junying Wang, Hongyuan Zhang, Yuan Yuan
Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement
Han Wu, Guanyan Ou, Weibin Wu et al.
Understanding and Enhancing the Transferability of Jailbreaking Attacks
Runqi Lin, Bo Han, Fengwang Li et al.
Stable Unlearnable Example: Enhancing the Robustness of Unlearnable Examples via Stable Error-Minimizing Noise
Yixin Liu, Kaidi Xu, Xun Chen et al.
Adversarial Training Should Be Cast as a Non-Zero-Sum Game
Alex Robey, Fabian Latorre, George Pappas et al.
Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving
Mozhgan Pourkeshavarz, Mohammad Sabokrou, Amir Rasouli
Robust Distillation via Untargeted and Targeted Intermediate Adversarial Samples
Junhao Dong, Piotr Koniusz, Junxi Chen et al.
RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors
Fengshuo Bai, Runze Liu, Yali Du et al.
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment
Xiaojun Jia, Sensen Gao, Simeng Qin et al.
Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios
Ziqiang Li, Hong Sun, Pengfei Xia et al.
Instant Adversarial Purification with Adversarial Consistency Distillation
Chun Tong Lei, Hon Ming Yam, Zhongliang Guo et al.
Demystifying Poisoning Backdoor Attacks from a Statistical Perspective
Ganghua Wang, Xun Xian, Ashish Kundu et al.
Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in RL
Xiangyu Liu, Souradip Chakraborty, Yanchao Sun et al.
Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
Jiaming Zhang, Junhong Ye, Xingjun Ma et al.
Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks
Peng Xie, Yequan Bie, Jianda Mao et al.
ADBA: Approximation Decision Boundary Approach for Black-Box Adversarial Attacks
Feiyang Wang, Xingquan Zuo, Hai Huang et al.
Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion
Honglei Miao, Fan Ma, Ruijie Quan et al.
Enhancing Adversarial Transferability with Adversarial Weight Tuning
Jiahao Chen, Zhou Feng, Rui Zeng et al.
AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses
Nicholas Carlini, Edoardo Debenedetti, Javier Rando et al.
Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment
Yaling Shen, Zhixiong Zhuang, Kun Yuan et al.
On the Robustness of Neural-Enhanced Video Streaming against Adversarial Attacks
Qihua Zhou, Jingcai Guo, Song Guo et al.
Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection
Youheng Sun, Shengming Yuan, Xuanhan Wang et al.
Boosting Fine-Grained Visual Anomaly Detection with Coarse-Knowledge-Aware Adversarial Learning
Qingqing Fang, Qinliang Su, Wenxi Lv et al.
UV-Attack: Physical-World Adversarial Attacks on Person Detection via Dynamic-NeRF-based UV Mapping
Yanjie Li, Kaisheng Liang, Bin Xiao
Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual
Ruichu Cai, Yuxuan Zhu, Jie Qiao et al.
Value at Adversarial Risk: A Graph Defense Strategy against Cost-Aware Attacks
Junlong Liao, Wenda Fu, Cong Wang et al.
Adversarial Feature Map Pruning for Backdoor
Dong HUANG, Qingwen Bu
Theoretical Understanding of Learning from Adversarial Perturbations
Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki
Understanding Model Ensemble in Transferable Adversarial Attack
Wei Yao, Zeliang Zhang, Huayi Tang et al.
HUANG: A Robust Diffusion Model-based Targeted Adversarial Attack Against Deep Hashing Retrieval
Chihan Huang, Xiaobo Shen
Enhancing Robustness in Incremental Learning with Adversarial Training
Seungju Cho, Hongsin Lee, Changick Kim
When Witnesses Defend: A Witness Graph Topological Layer for Adversarial Graph Learning
Naheed Anjum Arafat, Debabrota Basu, Yulia Gel et al.
Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools
Kanghua Mo, Li Hu, Yucheng Long et al.
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attack on Breast Ultrasound Images
Yasamin Medghalchi, Moein Heidari, Clayton Allard et al.
PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection
Xiao Li, Yiming Zhu, Yifan Huang et al.
A Sample-Level Evaluation and Generative Framework for Model Inversion Attacks
Haoyang Li, Li Bai, Qingqing Ye et al.
AIM: Additional Image Guided Generation of Transferable Adversarial Attacks
Teng Li, Xingjun Ma, Yu-Gang Jiang
ProAdvPrompter: A Two-Stage Journey to Effective Adversarial Prompting for LLMs
Hao Di, Tong He, Haishan Ye et al.
Confidence Elicitation: A New Attack Vector for Large Language Models
Brian Formento, Chuan Sheng Foo, See-Kiong Ng
Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness
Maayan Ehrenberg, Roy Ganz, Nir Rosenfeld
A Unified, Resilient, and Explainable Adversarial Patch Detector
Vishesh Kumar, Akshay Agarwal
Out-of-Bounding-Box Triggers: A Stealthy Approach to Cheat Object Detectors
Tao Lin, lijia Yu, Gaojie Jin et al.
Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation
Yuan Gan, Jiaxu Miao, Yunze Wang et al.
Boosting Adversarial Transferability via Residual Perturbation Attack
Jinjia Peng, Zeze Tao, Huibing Wang et al.
Detecting Adversarial Data Using Perturbation Forgery
Qian Wang, Chen Li, Yuchen Luo et al.
Data-free Universal Adversarial Perturbation with Pseudo-semantic Prior
Chanhui Lee, Yeonghwan Song, Jeany Son
Transferable 3D Adversarial Shape Completion using Diffusion Models
Xuelong Dai, Bin Xiao
Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits
Zhiwei Wang, Hongning Wang, Huazheng Wang
Adversarial Training for Defense Against Label Poisoning Attacks
Melis Ilayda Bal, Volkan Cevher, Michael Muehlebach
GPromptShield: Elevating Resilience in Graph Prompt Tuning Against Adversarial Attacks
Shuhan Song, Ping Li, Ming Dun et al.
First Line of Defense: A Robust First Layer Mitigates Adversarial Attacks
Janani Suresh, Nancy Nayak, Sheetal Kalyani
Democratic Training Against Universal Adversarial Perturbations
Bing Sun, Jun Sun, Wei Zhao
Understanding the Robustness of Randomized Feature Defense Against Query-Based Adversarial Attacks
Hung Quang Nguyen, Yingjie Lao, Tung Pham et al.
Adversarial Perturbations Are Formed by Iteratively Learning Linear Combinations of the Right Singular Vectors of the Adversarial Jacobian
Thomas Paniagua, Chinmay Savadikar, Tianfu Wu
Training A Secure Model against Data-Free Model Extraction
Zhenyi Wang, Li Shen, junfeng guo et al.
Non-Adaptive Adversarial Face Generation
Sunpill Kim, Seunghun Paik, Chanwoo Hwang et al.
Semantic Representation Attack against Aligned Large Language Models
Jiawei Lian, Jianhong Pan, Lefan Wang et al.
Towards Building Model/Prompt-Transferable Attackers against Large Vision-Language Models
Xiaowen Cai, Daizong Liu, Xiaoye Qu et al.
A Set of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers
Zhixiao Wu, Yao Lu, Jie Wen et al.
Diffusion Guided Adversarial State Perturbations in Reinforcement Learning
Xiaolin Sun, Feidi Liu, Zhengming Ding et al.
Consensus-Robust Transfer Attacks via Parameter and Representation Perturbations
Shixin Li, Zewei Li, Xiaojing Ma et al.
Transstratal Adversarial Attack: Compromising Multi-Layered Defenses in Text-to-Image Models
Chunlong Xie, Kangjie Chen, Shangwei Guo et al.
AdvEDM: Fine-grained Adversarial Attack against VLM-based Embodied Agents
Yichen Wang, Hangtao Zhang, Hewen Pan et al.
HQA-VLAttack: Towards High Quality Adversarial Attack on Vision-Language Pre-Trained Models
Han Liu, Jiaqi Li, Zhi Xu et al.
A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1
Zhaoyi Li, Xiaohan Zhao, Dong-Dong Wu et al.
GSBA$^K$: $top$-$K$ Geometric Score-based Black-box Attack
Md Farhamdur Reza, Richeng Jin, Tianfu Wu et al.
TransferBench: Benchmarking Ensemble-based Black-box Transfer Attacks
Fabio Brau, Maura Pintor, Antonio CinΓ et al.
Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks
Hung Quang Nguyen, Hieu Nguyen, Anh Ta et al.
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
Binghui Li, Yuanzhi Li
The Good, the Bad and the Ugly: Meta-Analysis of Watermarks, Transferable Attacks and Adversarial Defenses
Greg Gluch, Berkant Turan, Sai Ganesh Nagarajan et al.
ASTrA: Adversarial Self-supervised Training with Adaptive-Attacks
Prakash Chandra Chhipa, Gautam Vashishtha, Jithamanyu Settur et al.
Generating Less Certain Adversarial Examples Improves Robust Generalization
Minxing Zhang, Michael Backes, Xiao Zhang
Enhancing Diffusion-based Unrestricted Adversarial Attacks via Adversary Preferences Alignment
Kaixun Jiang, Zhaoyu Chen, HaiJing Guo et al.
Fit the Distribution: Cross-Image/Prompt Adversarial Attacks on Multimodal Large Language Models
Hai Yan, Haijian Ma, Xiaowen Cai et al.
Targeted Attack Improves Protection against Unauthorized Diffusion Customization
Boyang Zheng, Chumeng Liang, Xiaoyu Wu
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text
Yize Cheng, Vinu Sankar Sadasivan, Mehrdad Saberi et al.
AdvPaint: Protecting Images from Inpainting Manipulation via Adversarial Attention Disruption
Joonsung Jeon, Woo Jae Kim, Suhyeon Ha et al.
Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives
Zeliang Zhang, Susan Liang, Daiki Shimada et al.
Towards Irreversible Attack: Fooling Scene Text Recognition via Multi-Population Coevolution Search
Jingyu Li, Pengwen Dai, Mingqing Zhu et al.
Beyond Mere Token Analysis: A Hypergraph Metric Space Framework for Defending Against Socially Engineered LLM Attacks
Manohar Kaul, Aditya Saibewar, Sadbhavana Babar
MixAT: Combining Continuous and Discrete Adversarial Training for LLMs
Csaba DΓ©kΓ‘ny, Stefan Balauca, Dimitar I. Dimitrov et al.
Detecting Backdoor Samples in Contrastive Language Image Pretraining
Hanxun Huang, Sarah Erfani, Yige Li et al.
Adversary Aware Optimization for Robust Defense
Daniel Wesego, Pedram Rooshenas
A Closer Look at Curriculum Adversarial Training: From an Online Perspective
Lianghe Shi, Weiwei Liu
Attack To Defend: Exploiting Adversarial Attacks for Detecting Poisoned Models
Samar Fares, Karthik Nandakumar