2025 "adversarial attacks" Papers

22 papers found

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Xiaojun Jia, Sensen Gao, Simeng Qin et al.

NeurIPS 2025posterarXiv:2505.21494
12
citations

Adversary Aware Optimization for Robust Defense

Daniel Wesego, Pedram Rooshenas

NeurIPS 2025poster

Bits Leaked per Query: Information-Theoretic Bounds for Adversarial Attacks on LLMs

Masahiro Kaneko, Timothy Baldwin

NeurIPS 2025spotlightarXiv:2510.17000

Bridging Symmetry and Robustness: On the Role of Equivariance in Enhancing Adversarial Robustness

Longwei Wang, Ifrat Ikhtear Uddin, Prof. KC Santosh (PhD) et al.

NeurIPS 2025spotlightarXiv:2510.16171
2
citations

Confidence Elicitation: A New Attack Vector for Large Language Models

Brian Formento, Chuan Sheng Foo, See-Kiong Ng

ICLR 2025posterarXiv:2502.04643
2
citations

DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches

Yun Xing, Yue Cao, Nhat Chung et al.

NeurIPS 2025posterarXiv:2506.16690

Detecting Adversarial Data Using Perturbation Forgery

Qian Wang, Chen Li, Yuchen Luo et al.

CVPR 2025posterarXiv:2405.16226
2
citations

Dynamical Low-Rank Compression of Neural Networks with Robustness under Adversarial Attacks

Steffen Schotthöfer, Lexie Yang, Stefan Schnake

NeurIPS 2025oralarXiv:2505.08022
6
citations

Fit the Distribution: Cross-Image/Prompt Adversarial Attacks on Multimodal Large Language Models

Hai Yan, Haijian Ma, Xiaowen Cai et al.

NeurIPS 2025poster

GSBA$^K$: $top$-$K$ Geometric Score-based Black-box Attack

Md Farhamdur Reza, Richeng Jin, Tianfu Wu et al.

ICLR 2025posterarXiv:2503.12827
2
citations

IPAD: Inverse Prompt for AI Detection - A Robust and Interpretable LLM-Generated Text Detector

Zheng CHEN, Yushi Feng, Jisheng Dang et al.

NeurIPS 2025posterarXiv:2502.15902

Jailbreaking as a Reward Misspecification Problem

Zhihui Xie, Jiahui Gao, Lei Li et al.

ICLR 2025posterarXiv:2406.14393
9
citations

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Shiji Zhao, Ranjie Duan, Fengxiang Wang et al.

ICCV 2025posterarXiv:2501.04931
28
citations

MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents

Lukas Aichberger, Alasdair Paren, Guohao Li et al.

NeurIPS 2025posterarXiv:2503.10809
10
citations

MOS-Attack: A Scalable Multi-objective Adversarial Attack Framework

Ping Guo, Cheng Gong, Fei Liu et al.

CVPR 2025posterarXiv:2501.07251

Non-Adaptive Adversarial Face Generation

Sunpill Kim, Seunghun Paik, Chanwoo Hwang et al.

NeurIPS 2025posterarXiv:2507.12107
1
citations

NoPain: No-box Point Cloud Attack via Optimal Transport Singular Boundary

Zezeng Li, Xiaoyu Du, Na Lei et al.

CVPR 2025posterarXiv:2503.00063
4
citations

Robust LLM safeguarding via refusal feature adversarial training

Lei Yu, Virginie Do, Karen Hambardzumyan et al.

ICLR 2025posterarXiv:2409.20089

Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization

Parvin Nazari, Bojian Hou, Davoud Ataee Tarzanagh et al.

NeurIPS 2025posterarXiv:2511.01126

Towards Certification of Uncertainty Calibration under Adversarial Attacks

Cornelius Emde, Francesco Pinto, Thomas Lukasiewicz et al.

ICLR 2025posterarXiv:2405.13922
2
citations

Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective

Yiming Liu, Kezhao Liu, Yao Xiao et al.

ICLR 2025posterarXiv:2404.14309
6
citations

Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data

Zi Liang, Qingqing Ye, Xuan Liu et al.

NeurIPS 2025spotlight