Poster "adversarial attacks" Papers

60 papers found • Page 1 of 2

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Xiaojun Jia, Sensen Gao, Simeng Qin et al.

NeurIPS 2025posterarXiv:2505.21494
12
citations

Adversarial Robustness of Discriminative Self-Supervised Learning in Vision

Ömer Veysel Çağatan, Ömer TAL, M. Emre Gursoy

ICCV 2025posterarXiv:2503.06361

Adversary Aware Optimization for Robust Defense

Daniel Wesego, Pedram Rooshenas

NeurIPS 2025poster

Confidence Elicitation: A New Attack Vector for Large Language Models

Brian Formento, Chuan Sheng Foo, See-Kiong Ng

ICLR 2025posterarXiv:2502.04643
2
citations

DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches

Yun Xing, Yue Cao, Nhat Chung et al.

NeurIPS 2025posterarXiv:2506.16690

Detecting Adversarial Data Using Perturbation Forgery

Qian Wang, Chen Li, Yuchen Luo et al.

CVPR 2025posterarXiv:2405.16226
2
citations

Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models

Shuyang Hao, Bryan Hooi, Jun Liu et al.

CVPR 2025posterarXiv:2411.18000
5
citations

Fit the Distribution: Cross-Image/Prompt Adversarial Attacks on Multimodal Large Language Models

Hai Yan, Haijian Ma, Xiaowen Cai et al.

NeurIPS 2025poster

GSBA$^K$: $top$-$K$ Geometric Score-based Black-box Attack

Md Farhamdur Reza, Richeng Jin, Tianfu Wu et al.

ICLR 2025posterarXiv:2503.12827
2
citations

IPAD: Inverse Prompt for AI Detection - A Robust and Interpretable LLM-Generated Text Detector

Zheng CHEN, Yushi Feng, Jisheng Dang et al.

NeurIPS 2025posterarXiv:2502.15902

Jailbreaking as a Reward Misspecification Problem

Zhihui Xie, Jiahui Gao, Lei Li et al.

ICLR 2025posterarXiv:2406.14393
9
citations

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Shiji Zhao, Ranjie Duan, Fengxiang Wang et al.

ICCV 2025posterarXiv:2501.04931
28
citations

Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy

Jie Ren, Zhenwei Dai, Xianfeng Tang et al.

NeurIPS 2025posterarXiv:2506.00359
6
citations

MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents

Lukas Aichberger, Alasdair Paren, Guohao Li et al.

NeurIPS 2025posterarXiv:2503.10809
10
citations

MOS-Attack: A Scalable Multi-objective Adversarial Attack Framework

Ping Guo, Cheng Gong, Fei Liu et al.

CVPR 2025posterarXiv:2501.07251

Non-Adaptive Adversarial Face Generation

Sunpill Kim, Seunghun Paik, Chanwoo Hwang et al.

NeurIPS 2025posterarXiv:2507.12107
1
citations

NoPain: No-box Point Cloud Attack via Optimal Transport Singular Boundary

Zezeng Li, Xiaoyu Du, Na Lei et al.

CVPR 2025posterarXiv:2503.00063
4
citations

On the Stability of Graph Convolutional Neural Networks: A Probabilistic Perspective

Ning Zhang, Henry Kenlay, Li Zhang et al.

NeurIPS 2025posterarXiv:2506.01213

Robust LLM safeguarding via refusal feature adversarial training

Lei Yu, Virginie Do, Karen Hambardzumyan et al.

ICLR 2025posterarXiv:2409.20089

SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

Buyun Liang, Liangzu Peng, Jinqi Luo et al.

NeurIPS 2025posterarXiv:2510.04398

Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization

Parvin Nazari, Bojian Hou, Davoud Ataee Tarzanagh et al.

NeurIPS 2025posterarXiv:2511.01126

TAROT: Towards Essentially Domain-Invariant Robustness with Theoretical Justification

Dongyoon Yang, Jihu Lee, Yongdai Kim

CVPR 2025posterarXiv:2505.06580

Towards Certification of Uncertainty Calibration under Adversarial Attacks

Cornelius Emde, Francesco Pinto, Thomas Lukasiewicz et al.

ICLR 2025posterarXiv:2405.13922
2
citations

Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective

Yiming Liu, Kezhao Liu, Yao Xiao et al.

ICLR 2025posterarXiv:2404.14309
6
citations

$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts

Guanjie Chen, Xinyu Zhao, Tianlong Chen et al.

ICML 2024poster

Adversarially Robust Deep Multi-View Clustering: A Novel Attack and Defense Framework

Haonan Huang, Guoxu Zhou, Yanghang Zheng et al.

ICML 2024poster

Adversarial Prompt Tuning for Vision-Language Models

Jiaming Zhang, Xingjun Ma, Xin Wang et al.

ECCV 2024posterarXiv:2311.11261
34
citations

A Secure Image Watermarking Framework with Statistical Guarantees via Adversarial Attacks on Secret Key Networks

Feiyu CHEN, Wei Lin, Ziquan Liu et al.

ECCV 2024poster
1
citations

Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

Chung-En Sun, Sicun Gao, Lily Weng

ICML 2024poster

Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models

Vitali Petsiuk, Kate Saenko

ECCV 2024posterarXiv:2404.13706
8
citations

CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks

Shashank Agnihotri, Steffen Jung, Margret Keuper

ICML 2024poster

DataFreeShield: Defending Adversarial Attacks without Training Data

Hyeyoon Lee, Kanghyun Choi, Dain Kwon et al.

ICML 2024poster

Enhancing Adversarial Robustness in SNNs with Sparse Gradients

Yujia Liu, Tong Bu, Ding Jianhao et al.

ICML 2024poster

Exploring Vulnerabilities in Spiking Neural Networks: Direct Adversarial Attacks on Raw Event Data

Yanmeng Yao, Xiaohan Zhao, Bin Gu

ECCV 2024poster
9
citations

Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions

Jon Vadillo, Roberto Santana, Jose A Lozano

ICML 2024poster

Fast Adversarial Attacks on Language Models In One GPU Minute

Vinu Sankar Sadasivan, Shoumik Saha, Gaurang Sriramanan et al.

ICML 2024poster

Graph Neural Network Explanations are Fragile

Jiate Li, Meng Pang, Yun Dong et al.

ICML 2024poster

Improved Dimensionality Dependence for Zeroth-Order Optimisation over Cross-Polytopes

Weijia Shao

ICML 2024poster

Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution

Eslam Zaher, Maciej Trzaskowski, Quan Nguyen et al.

ICML 2024poster

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

Xin Liu, Yichen Zhu, Jindong Gu et al.

ECCV 2024posterarXiv:2311.17600
183
citations

MultiDelete for Multimodal Machine Unlearning

Jiali Cheng, Hadi Amiri

ECCV 2024posterarXiv:2311.12047
13
citations

On the Duality Between Sharpness-Aware Minimization and Adversarial Training

Yihao Zhang, Hangzhou He, Jingyu Zhu et al.

ICML 2024poster

Rethinking Adversarial Robustness in the Context of the Right to be Forgotten

Chenxu Zhao, Wei Qian, Yangyi Li et al.

ICML 2024poster

Rethinking Independent Cross-Entropy Loss For Graph-Structured Data

Rui Miao, Kaixiong Zhou, Yili Wang et al.

ICML 2024poster

Revisiting Character-level Adversarial Attacks for Language Models

Elias Abad Rocamora, Yongtao Wu, Fanghui Liu et al.

ICML 2024poster

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Zhuowen Yuan, Zidi Xiong, Yi Zeng et al.

ICML 2024poster

Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models

Christian Schlarmann, Naman Singh, Francesco Croce et al.

ICML 2024poster

Robustness Tokens: Towards Adversarial Robustness of Transformers

Brian Pulfer, Yury Belousov, Slava Voloshynovskiy

ECCV 2024posterarXiv:2503.10191

Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models

Yongshuo Zong, Ondrej Bohdal, Tingyang Yu et al.

ICML 2024poster

Shedding More Light on Robust Classifiers under the lens of Energy-based Models

Mujtaba Hussain Mirza, Maria Rosaria Briglia, Senad Beadini et al.

ECCV 2024posterarXiv:2407.06315
7
citations
← PreviousNext →