The Good, the Bad and the Ugly: Meta-Analysis of Watermarks, Transferable Attacks and Adversarial Defenses

0citations

arXiv:2410.08864

Citations

#1938

in NeurIPS 2025

of 5858 papers

Authors

Data Points

Authors

Greg Gluch Berkant Turan Sai Ganesh Nagarajan Sebastian Pokutta

Topics

backdoor-based watermarks adversarial defenses transferable attacks fully homomorphic encryption vc-dimension interactive protocol cryptographic techniques

Abstract

We formalize and analyze the trade-off between backdoor-based watermarks and adversarial defenses, framing it as an interactive protocol between a verifier and a prover. While previous works have primarily focused on this trade-off, our analysis extends it by identifying transferable attacks as a third, counterintuitive, but necessary option. Our main result shows that for all learning tasks, at least one of the three exists: a watermark, an adversarial defense, or a transferable attack. By transferable attack, we refer to an efficient algorithm that generates queries indistinguishable from the data distribution and capable of fooling all efficient defenders. Using cryptographic techniques, specifically fully homomorphic encryption, we construct a transferable attack and prove its necessity in this trade-off. Finally, we show that tasks of bounded VC-dimension allow adversarial defenses against all attackers, while a subclass allows watermarks secure against fast adversaries.

Citation History

Jan 25, 2026

Jan 27, 2026

Jan 31, 2026