2025 Poster "trojan attacks" Papers
2 papers found
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing
Keltin Grimes, Marco Christiani, David Shriver et al.
ICLR 2025posterarXiv:2412.13341
6
citations
DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion
Hossein Mirzaei, Zeinab Taghavi, Sepehr Rezaee et al.
ICCV 2025posterarXiv:2507.22813