🧬Robustness

Adversarial Attacks

Crafting adversarial examples

230 papers(showing top 100)1,469 total citations
Compare with other topics
Mar '24 β€” Feb '26186 papers
Also includes: adversarial attacks, adversarial examples, attack methods, adversarial perturbations

Top Papers

#1

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

Maksym Andriushchenko, francesco croce, Nicolas Flammarion

ICLR 2025arXiv:2404.02151
375
citations
#2

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

Anselm Paulus, Arman Zharmagambetov, Chuan Guo et al.

ICML 2025arXiv:2404.16873
123
citations
#3

Rethinking Model Ensemble in Transfer-based Adversarial Attacks

Huanran Chen, Yichi Zhang, Yinpeng Dong et al.

ICLR 2024arXiv:2303.09105
96
citations
#4

On the Robustness of Large Multimodal Models Against Image Adversarial Attacks

Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang et al.

CVPR 2024arXiv:2312.03777
80
citations
#5

Dissecting Adversarial Robustness of Multimodal LM Agents

Chen Wu, Rishi Shah, Jing Yu Koh et al.

ICLR 2025arXiv:2406.12814
77
citations
#6

Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks

Mehrdad Saberi, Vinu Sankar Sadasivan, Keivan Rezaei et al.

ICLR 2024arXiv:2310.00076
74
citations
#7

DAP: A Dynamic Adversarial Patch for Evading Person Detectors

Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif et al.

CVPR 2024arXiv:2305.11618
48
citations
#8

Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning

Seanie Lee, Minsu Kim, Lynn Cherif et al.

ICLR 2025arXiv:2405.18540
automated red-teaminglarge language modelssafety tuningreinforcement learning+4
42
citations
#9

PAD: Patch-Agnostic Defense against Adversarial Patch Attacks

Lihua Jing, Rui Wang, Wenqi Ren et al.

CVPR 2024arXiv:2404.16452
39
citations
#10

Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI

Robert HΓΆnig, Javier Rando, Nicholas Carlini et al.

ICLR 2025arXiv:2406.12027
adversarial perturbationsstyle mimicryimage generation modelsartistic style protection+2
35
citations
#11

Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model

Decheng Liu, Xijun Wang, Chunlei Peng et al.

AAAI 2024arXiv:2312.11285
adversarial attacksface recognition modelslatent diffusion modelidentity-sensitive conditioning+4
34
citations
#12

Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models

Peifei Zhu, Tsubasa Takahashi, Hirokatsu Kataoka

CVPR 2024arXiv:2404.09401
34
citations
#13

Adversarial Search Engine Optimization for Large Language Models

Fredrik Nestaas, Edoardo Debenedetti, Florian Tramer

ICLR 2025
25
citations
#14

ADBM: Adversarial Diffusion Bridge Model for Reliable Adversarial Purification

Xiao Li, Wenxuan Sun, Huanran Chen et al.

ICLR 2025arXiv:2408.00315
24
citations
#15

Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

Andreas MΓΌller, Denis Lukovnikov, Jonas Thietke et al.

CVPR 2025
20
citations
#16

Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks

Junying Wang, Hongyuan Zhang, Yuan Yuan

CVPR 2025arXiv:2503.08269
customized portrait generationfacial adversarial attacksprivacy protectionface recognition systems+3
20
citations
#17

Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement

Han Wu, Guanyan Ou, Weibin Wu et al.

CVPR 2024
19
citations
#18

Understanding and Enhancing the Transferability of Jailbreaking Attacks

Runqi Lin, Bo Han, Fengwang Li et al.

ICLR 2025arXiv:2502.03052
16
citations
#19

Stable Unlearnable Example: Enhancing the Robustness of Unlearnable Examples via Stable Error-Minimizing Noise

Yixin Liu, Kaidi Xu, Xun Chen et al.

AAAI 2024arXiv:2311.13091
unlearnable examplesdata poisoningadversarial trainingdefensive noise+4
16
citations
#20

Adversarial Training Should Be Cast as a Non-Zero-Sum Game

Alex Robey, Fabian Latorre, George Pappas et al.

ICLR 2024arXiv:2306.11035
15
citations
#21

Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving

Mozhgan Pourkeshavarz, Mohammad Sabokrou, Amir Rasouli

CVPR 2024arXiv:2306.15755
14
citations
#22

Robust Distillation via Untargeted and Targeted Intermediate Adversarial Samples

Junhao Dong, Piotr Koniusz, Junxi Chen et al.

CVPR 2024
14
citations
#23

RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors

Fengshuo Bai, Runze Liu, Yali Du et al.

AAAI 2025arXiv:2412.10713
12
citations
#24

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Xiaojun Jia, Sensen Gao, Simeng Qin et al.

NeurIPS 2025arXiv:2505.21494
adversarial attacksmultimodal large language modelsfeature alignmentoptimal transport+4
12
citations
#25

Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios

Ziqiang Li, Hong Sun, Pengfei Xia et al.

ICLR 2024arXiv:2306.08386
11
citations
#26

Instant Adversarial Purification with Adversarial Consistency Distillation

Chun Tong Lei, Hon Ming Yam, Zhongliang Guo et al.

CVPR 2025arXiv:2408.17064
11
citations
#27

Demystifying Poisoning Backdoor Attacks from a Statistical Perspective

Ganghua Wang, Xun Xian, Ashish Kundu et al.

ICLR 2024arXiv:2310.10780
10
citations
#28

Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in RL

Xiangyu Liu, Souradip Chakraborty, Yanchao Sun et al.

ICLR 2024arXiv:2305.17342
9
citations
#29

Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

Jiaming Zhang, Junhong Ye, Xingjun Ma et al.

CVPR 2025
9
citations
#30

Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks

Peng Xie, Yequan Bie, Jianda Mao et al.

CVPR 2025arXiv:2411.15720
9
citations
#31

ADBA: Approximation Decision Boundary Approach for Black-Box Adversarial Attacks

Feiyang Wang, Xingquan Zuo, Hai Huang et al.

AAAI 2025
9
citations
#32

Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion

Honglei Miao, Fan Ma, Ruijie Quan et al.

AAAI 2025arXiv:2408.00352
8
citations
#33

Enhancing Adversarial Transferability with Adversarial Weight Tuning

Jiahao Chen, Zhou Feng, Rui Zeng et al.

AAAI 2025arXiv:2408.09469
8
citations
#34

AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses

Nicholas Carlini, Edoardo Debenedetti, Javier Rando et al.

ICML 2025arXiv:2503.01811
8
citations
#35

Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment

Yaling Shen, Zhixiong Zhuang, Kun Yuan et al.

AAAI 2025arXiv:2502.02438
7
citations
#36

On the Robustness of Neural-Enhanced Video Streaming against Adversarial Attacks

Qihua Zhou, Jingcai Guo, Song Guo et al.

AAAI 2024
7
citations
#37

Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection

Youheng Sun, Shengming Yuan, Xuanhan Wang et al.

ECCV 2024arXiv:2407.12292
targeted adversarial attackadversarial example generationlatent representation injectionclass-agnostic attack+4
7
citations
#38

Boosting Fine-Grained Visual Anomaly Detection with Coarse-Knowledge-Aware Adversarial Learning

Qingqing Fang, Qinliang Su, Wenxi Lv et al.

AAAI 2025arXiv:2412.12850
6
citations
#39

UV-Attack: Physical-World Adversarial Attacks on Person Detection via Dynamic-NeRF-based UV Mapping

Yanjie Li, Kaisheng Liang, Bin Xiao

ICLR 2025
6
citations
#40

Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual

Ruichu Cai, Yuxuan Zhu, Jie Qiao et al.

AAAI 2024arXiv:2312.13628
adversarial examplescausal generating processcounterfactual adversarial examplesunrestricted attacks+4
5
citations
#41

Value at Adversarial Risk: A Graph Defense Strategy against Cost-Aware Attacks

Junlong Liao, Wenda Fu, Cong Wang et al.

AAAI 2024
5
citations
#42

Adversarial Feature Map Pruning for Backdoor

Dong HUANG, Qingwen Bu

ICLR 2024arXiv:2307.11565
5
citations
#43

Theoretical Understanding of Learning from Adversarial Perturbations

Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki

ICLR 2024arXiv:2402.10470
4
citations
#44

Understanding Model Ensemble in Transferable Adversarial Attack

Wei Yao, Zeliang Zhang, Huayi Tang et al.

ICML 2025arXiv:2410.06851
4
citations
#45

HUANG: A Robust Diffusion Model-based Targeted Adversarial Attack Against Deep Hashing Retrieval

Chihan Huang, Xiaobo Shen

AAAI 2025
4
citations
#46

Enhancing Robustness in Incremental Learning with Adversarial Training

Seungju Cho, Hongsin Lee, Changick Kim

AAAI 2025arXiv:2312.03289
4
citations
#47

When Witnesses Defend: A Witness Graph Topological Layer for Adversarial Graph Learning

Naheed Anjum Arafat, Debabrota Basu, Yulia Gel et al.

AAAI 2025arXiv:2409.14161
4
citations
#48

Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools

Kanghua Mo, Li Hu, Yucheng Long et al.

NeurIPS 2025arXiv:2508.02110
tool metadata manipulationllm agent securityblack-box optimizationprivacy leakage attacks+4
4
citations
#49

Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attack on Breast Ultrasound Images

Yasamin Medghalchi, Moein Heidari, Clayton Allard et al.

CVPR 2025arXiv:2412.09910
4
citations
#50

PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection

Xiao Li, Yiming Zhu, Yifan Huang et al.

ICCV 2025arXiv:2506.23581
3
citations
#51

A Sample-Level Evaluation and Generative Framework for Model Inversion Attacks

Haoyang Li, Li Bai, Qingqing Ye et al.

AAAI 2025arXiv:2502.19070
3
citations
#52

AIM: Additional Image Guided Generation of Transferable Adversarial Attacks

Teng Li, Xingjun Ma, Yu-Gang Jiang

AAAI 2025arXiv:2501.01106
3
citations
#53

ProAdvPrompter: A Two-Stage Journey to Effective Adversarial Prompting for LLMs

Hao Di, Tong He, Haishan Ye et al.

ICLR 2025
2
citations
#54

Confidence Elicitation: A New Attack Vector for Large Language Models

Brian Formento, Chuan Sheng Foo, See-Kiong Ng

ICLR 2025arXiv:2502.04643
adversarial robustnesslarge language modelsblack-box attacksconfidence elicitation+4
2
citations
#55

Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness

Maayan Ehrenberg, Roy Ganz, Nir Rosenfeld

ICLR 2025
2
citations
#56

A Unified, Resilient, and Explainable Adversarial Patch Detector

Vishesh Kumar, Akshay Agarwal

CVPR 2025
2
citations
#57

Out-of-Bounding-Box Triggers: A Stealthy Approach to Cheat Object Detectors

Tao Lin, lijia Yu, Gaojie Jin et al.

ECCV 2024arXiv:2410.10091
adversarial robustnessobject detection systemsphysical adversarial attacksadversarial triggers+3
2
citations
#58

Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation

Yuan Gan, Jiaxu Miao, Yunze Wang et al.

CVPR 2025arXiv:2506.01591
2
citations
#59

Boosting Adversarial Transferability via Residual Perturbation Attack

Jinjia Peng, Zeze Tao, Huibing Wang et al.

ICCV 2025arXiv:2508.05689
2
citations
#60

Detecting Adversarial Data Using Perturbation Forgery

Qian Wang, Chen Li, Yuchen Luo et al.

CVPR 2025arXiv:2405.16226
adversarial detectionadversarial attacksnoise patternsgenerative models+3
2
citations
#61

Data-free Universal Adversarial Perturbation with Pseudo-semantic Prior

Chanhui Lee, Yeonghwan Song, Jeany Son

CVPR 2025arXiv:2502.21048
1
citations
#62

Transferable 3D Adversarial Shape Completion using Diffusion Models

Xuelong Dai, Bin Xiao

ECCV 2024arXiv:2407.10077
1
citations
#63

Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits

Zhiwei Wang, Hongning Wang, Huazheng Wang

AAAI 2024arXiv:2402.13487
stochastic multi-armed banditsadversarial attacksreward poisoning attacksattack detection+3
1
citations
#64

Adversarial Training for Defense Against Label Poisoning Attacks

Melis Ilayda Bal, Volkan Cevher, Michael Muehlebach

ICLR 2025arXiv:2502.17121
1
citations
#65

GPromptShield: Elevating Resilience in Graph Prompt Tuning Against Adversarial Attacks

Shuhan Song, Ping Li, Ming Dun et al.

ICLR 2025
1
citations
#66

First Line of Defense: A Robust First Layer Mitigates Adversarial Attacks

Janani Suresh, Nancy Nayak, Sheetal Kalyani

AAAI 2025arXiv:2408.11680
1
citations
#67

Democratic Training Against Universal Adversarial Perturbations

Bing Sun, Jun Sun, Wei Zhao

ICLR 2025arXiv:2502.05542
1
citations
#68

Understanding the Robustness of Randomized Feature Defense Against Query-Based Adversarial Attacks

Hung Quang Nguyen, Yingjie Lao, Tung Pham et al.

ICLR 2024arXiv:2310.00567
1
citations
#69

Adversarial Perturbations Are Formed by Iteratively Learning Linear Combinations of the Right Singular Vectors of the Adversarial Jacobian

Thomas Paniagua, Chinmay Savadikar, Tianfu Wu

ICML 2025
1
citations
#70

Training A Secure Model against Data-Free Model Extraction

Zhenyi Wang, Li Shen, junfeng guo et al.

ECCV 2024
1
citations
#71

Non-Adaptive Adversarial Face Generation

Sunpill Kim, Seunghun Paik, Chanwoo Hwang et al.

NeurIPS 2025arXiv:2507.12107
adversarial face generationface recognition systemsadversarial attacksfeature space structure+3
1
citations
#72

Semantic Representation Attack against Aligned Large Language Models

Jiawei Lian, Jianhong Pan, Lefan Wang et al.

NeurIPS 2025
1
citations
#73

Towards Building Model/Prompt-Transferable Attackers against Large Vision-Language Models

Xiaowen Cai, Daizong Liu, Xiaoye Qu et al.

NeurIPS 2025
β€”
not collected
#74

A Set of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers

Zhixiao Wu, Yao Lu, Jie Wen et al.

NeurIPS 2025
β€”
not collected
#75

Diffusion Guided Adversarial State Perturbations in Reinforcement Learning

Xiaolin Sun, Feidi Liu, Zhengming Ding et al.

NeurIPS 2025
β€”
not collected
#76

Consensus-Robust Transfer Attacks via Parameter and Representation Perturbations

Shixin Li, Zewei Li, Xiaojing Ma et al.

NeurIPS 2025
β€”
not collected
#77

Transstratal Adversarial Attack: Compromising Multi-Layered Defenses in Text-to-Image Models

Chunlong Xie, Kangjie Chen, Shangwei Guo et al.

NeurIPS 2025
adversarial attackstext-to-image modelsmulti-layered defensessafety mechanisms+4
β€”
not collected
#78

AdvEDM: Fine-grained Adversarial Attack against VLM-based Embodied Agents

Yichen Wang, Hangtao Zhang, Hewen Pan et al.

NeurIPS 2025
β€”
not collected
#79

HQA-VLAttack: Towards High Quality Adversarial Attack on Vision-Language Pre-Trained Models

Han Liu, Jiaqi Li, Zhi Xu et al.

NeurIPS 2025
adversarial attackvision-language modelsblack-box attackcontrastive learning+3
β€”
not collected
#80

A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1

Zhaoyi Li, Xiaohan Zhao, Dong-Dong Wu et al.

NeurIPS 2025
β€”
not collected
#81

GSBA$^K$: $top$-$K$ Geometric Score-based Black-box Attack

Md Farhamdur Reza, Richeng Jin, Tianfu Wu et al.

ICLR 2025arXiv:2503.12827
adversarial attacksblack-box attacksscore-based attackstop-k predictions+4
β€”
not collected
#82

TransferBench: Benchmarking Ensemble-based Black-box Transfer Attacks

Fabio Brau, Maura Pintor, Antonio CinΓ  et al.

NeurIPS 2025
adversarial examplesblack-box attackstransfer attacksensemble methods+4
β€”
not collected
#83

Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks

Hung Quang Nguyen, Hieu Nguyen, Anh Ta et al.

ICLR 2025
β€”
not collected
#84

Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data

Binghui Li, Yuanzhi Li

ICLR 2025
β€”
not collected
#85

The Good, the Bad and the Ugly: Meta-Analysis of Watermarks, Transferable Attacks and Adversarial Defenses

Greg Gluch, Berkant Turan, Sai Ganesh Nagarajan et al.

NeurIPS 2025arXiv:2410.08864
backdoor-based watermarksadversarial defensestransferable attacksfully homomorphic encryption+3
β€”
not collected
#86

ASTrA: Adversarial Self-supervised Training with Adaptive-Attacks

Prakash Chandra Chhipa, Gautam Vashishtha, Jithamanyu Settur et al.

ICLR 2025
β€”
not collected
#87

Generating Less Certain Adversarial Examples Improves Robust Generalization

Minxing Zhang, Michael Backes, Xiao Zhang

ICLR 2025arXiv:2310.04539
adversarial trainingrobust generalizationadversarial examplesmodel certainty+3
β€”
not collected
#88

Enhancing Diffusion-based Unrestricted Adversarial Attacks via Adversary Preferences Alignment

Kaixun Jiang, Zhaoyu Chen, HaiJing Guo et al.

NeurIPS 2025
β€”
not collected
#89

Fit the Distribution: Cross-Image/Prompt Adversarial Attacks on Multimodal Large Language Models

Hai Yan, Haijian Ma, Xiaowen Cai et al.

NeurIPS 2025
adversarial attacksmultimodal large language modelsdistribution approximation theorycross-image transfer attacks+4
β€”
not collected
#90

Targeted Attack Improves Protection against Unauthorized Diffusion Customization

Boyang Zheng, Chumeng Liang, Xiaoyu Wu

ICLR 2025
β€”
not collected
#91

Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text

Yize Cheng, Vinu Sankar Sadasivan, Mehrdad Saberi et al.

NeurIPS 2025
β€”
not collected
#92

AdvPaint: Protecting Images from Inpainting Manipulation via Adversarial Attention Disruption

Joonsung Jeon, Woo Jae Kim, Suhyeon Ha et al.

ICLR 2025arXiv:2503.10081
adversarial perturbationsdiffusion modelsimage inpaintingattention mechanism disruption+3
β€”
not collected
#93

Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives

Zeliang Zhang, Susan Liang, Daiki Shimada et al.

ICLR 2025
β€”
not collected
#94

Towards Irreversible Attack: Fooling Scene Text Recognition via Multi-Population Coevolution Search

Jingyu Li, Pengwen Dai, Mingqing Zhu et al.

NeurIPS 2025
β€”
not collected
#95

Beyond Mere Token Analysis: A Hypergraph Metric Space Framework for Defending Against Socially Engineered LLM Attacks

Manohar Kaul, Aditya Saibewar, Sadbhavana Babar

ICLR 2025
β€”
not collected
#96

MixAT: Combining Continuous and Discrete Adversarial Training for LLMs

Csaba DΓ©kΓ‘ny, Stefan Balauca, Dimitar I. Dimitrov et al.

NeurIPS 2025
β€”
not collected
#97

Detecting Backdoor Samples in Contrastive Language Image Pretraining

Hanxun Huang, Sarah Erfani, Yige Li et al.

ICLR 2025
β€”
not collected
#98

Adversary Aware Optimization for Robust Defense

Daniel Wesego, Pedram Rooshenas

NeurIPS 2025
adversarial attacksoptimization-based purificationdiffusion priorscore-based generative models+4
β€”
not collected
#99

A Closer Look at Curriculum Adversarial Training: From an Online Perspective

Lianghe Shi, Weiwei Liu

AAAI 2024
β€”
not collected
#100

Attack To Defend: Exploiting Adversarial Attacks for Detecting Poisoned Models

Samar Fares, Karthik Nandakumar

CVPR 2024
β€”
not collected