Poster "adversarial attacks" Papers
60 papers found • Page 1 of 2
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment
Xiaojun Jia, Sensen Gao, Simeng Qin et al.
Adversarial Robustness of Discriminative Self-Supervised Learning in Vision
Ömer Veysel Çağatan, Ömer TAL, M. Emre Gursoy
Adversary Aware Optimization for Robust Defense
Daniel Wesego, Pedram Rooshenas
Confidence Elicitation: A New Attack Vector for Large Language Models
Brian Formento, Chuan Sheng Foo, See-Kiong Ng
DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches
Yun Xing, Yue Cao, Nhat Chung et al.
Detecting Adversarial Data Using Perturbation Forgery
Qian Wang, Chen Li, Yuchen Luo et al.
Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models
Shuyang Hao, Bryan Hooi, Jun Liu et al.
Fit the Distribution: Cross-Image/Prompt Adversarial Attacks on Multimodal Large Language Models
Hai Yan, Haijian Ma, Xiaowen Cai et al.
GSBA$^K$: $top$-$K$ Geometric Score-based Black-box Attack
Md Farhamdur Reza, Richeng Jin, Tianfu Wu et al.
IPAD: Inverse Prompt for AI Detection - A Robust and Interpretable LLM-Generated Text Detector
Zheng CHEN, Yushi Feng, Jisheng Dang et al.
Jailbreaking as a Reward Misspecification Problem
Zhihui Xie, Jiahui Gao, Lei Li et al.
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Shiji Zhao, Ranjie Duan, Fengxiang Wang et al.
Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy
Jie Ren, Zhenwei Dai, Xianfeng Tang et al.
MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents
Lukas Aichberger, Alasdair Paren, Guohao Li et al.
MOS-Attack: A Scalable Multi-objective Adversarial Attack Framework
Ping Guo, Cheng Gong, Fei Liu et al.
Non-Adaptive Adversarial Face Generation
Sunpill Kim, Seunghun Paik, Chanwoo Hwang et al.
NoPain: No-box Point Cloud Attack via Optimal Transport Singular Boundary
Zezeng Li, Xiaoyu Du, Na Lei et al.
On the Stability of Graph Convolutional Neural Networks: A Probabilistic Perspective
Ning Zhang, Henry Kenlay, Li Zhang et al.
Robust LLM safeguarding via refusal feature adversarial training
Lei Yu, Virginie Do, Karen Hambardzumyan et al.
SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
Buyun Liang, Liangzu Peng, Jinqi Luo et al.
Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization
Parvin Nazari, Bojian Hou, Davoud Ataee Tarzanagh et al.
TAROT: Towards Essentially Domain-Invariant Robustness with Theoretical Justification
Dongyoon Yang, Jihu Lee, Yongdai Kim
Towards Certification of Uncertainty Calibration under Adversarial Attacks
Cornelius Emde, Francesco Pinto, Thomas Lukasiewicz et al.
Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective
Yiming Liu, Kezhao Liu, Yao Xiao et al.
$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen, Xinyu Zhao, Tianlong Chen et al.
Adversarially Robust Deep Multi-View Clustering: A Novel Attack and Defense Framework
Haonan Huang, Guoxu Zhou, Yanghang Zheng et al.
Adversarial Prompt Tuning for Vision-Language Models
Jiaming Zhang, Xingjun Ma, Xin Wang et al.
A Secure Image Watermarking Framework with Statistical Guarantees via Adversarial Attacks on Secret Key Networks
Feiyu CHEN, Wei Lin, Ziquan Liu et al.
Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents
Chung-En Sun, Sicun Gao, Lily Weng
Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models
Vitali Petsiuk, Kate Saenko
CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks
Shashank Agnihotri, Steffen Jung, Margret Keuper
DataFreeShield: Defending Adversarial Attacks without Training Data
Hyeyoon Lee, Kanghyun Choi, Dain Kwon et al.
Enhancing Adversarial Robustness in SNNs with Sparse Gradients
Yujia Liu, Tong Bu, Ding Jianhao et al.
Exploring Vulnerabilities in Spiking Neural Networks: Direct Adversarial Attacks on Raw Event Data
Yanmeng Yao, Xiaohan Zhao, Bin Gu
Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions
Jon Vadillo, Roberto Santana, Jose A Lozano
Fast Adversarial Attacks on Language Models In One GPU Minute
Vinu Sankar Sadasivan, Shoumik Saha, Gaurang Sriramanan et al.
Graph Neural Network Explanations are Fragile
Jiate Li, Meng Pang, Yun Dong et al.
Improved Dimensionality Dependence for Zeroth-Order Optimisation over Cross-Polytopes
Weijia Shao
Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution
Eslam Zaher, Maciej Trzaskowski, Quan Nguyen et al.
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
Xin Liu, Yichen Zhu, Jindong Gu et al.
MultiDelete for Multimodal Machine Unlearning
Jiali Cheng, Hadi Amiri
On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Yihao Zhang, Hangzhou He, Jingyu Zhu et al.
Rethinking Adversarial Robustness in the Context of the Right to be Forgotten
Chenxu Zhao, Wei Qian, Yangyi Li et al.
Rethinking Independent Cross-Entropy Loss For Graph-Structured Data
Rui Miao, Kaixiong Zhou, Yili Wang et al.
Revisiting Character-level Adversarial Attacks for Language Models
Elias Abad Rocamora, Yongtao Wu, Fanghui Liu et al.
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
Zhuowen Yuan, Zidi Xiong, Yi Zeng et al.
Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
Christian Schlarmann, Naman Singh, Francesco Croce et al.
Robustness Tokens: Towards Adversarial Robustness of Transformers
Brian Pulfer, Yury Belousov, Slava Voloshynovskiy
Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
Yongshuo Zong, Ondrej Bohdal, Tingyang Yu et al.
Shedding More Light on Robust Classifiers under the lens of Energy-based Models
Mujtaba Hussain Mirza, Maria Rosaria Briglia, Senad Beadini et al.