Spotlight by Abhay Sheshadri Papers
2 papers found
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
Phillip Guo, Aaquib Syed, Abhay Sheshadri et al.
ICML 2025spotlightarXiv:2410.12949
Why Do Some Language Models Fake Alignment While Others Don't?
Abhay Sheshadri, John Hughes, Julian Michael et al.
NEURIPS 2025spotlightarXiv:2506.18032
5
citations