Enhancing LLM Watermark Resilience Against Both Scrubbing and Spoofing Attacks

0
Citations
#2219
in NeurIPS 2025
of 5858 papers
3
Authors
4
Data Points

Abstract

Watermarking is a promising defense against the misuse of large language models (LLMs), yet it remains vulnerable to scrubbing and spoofing attacks. This vulnerability stems from an inherent trade-off governed by watermark window size: smaller windows resist scrubbing better but are easier to reverse-engineer, enabling low-cost statistics-based spoofing attacks. This work expands the trade-off boundary by introducing a novel mechanism, equivalent texture keys, where multiple tokens within a watermark window can independently support the detection. Based on the redundancy, we propose a watermark scheme withSub-vocabulary decomposedEquivalent tExtureKey (SEEK). It achieves a Pareto improvement, increasing the resilience against scrubbing attacks without compromising robustness to spoofing. Our code will be available athttps://github.com/Hearum/SeekWM.

Citation History

Jan 25, 2026
0
Jan 27, 2026
0
Jan 27, 2026
0
Jan 28, 2026
0