by Abhay Sheshadri Papers
2 papers found
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
Phillip Guo, Aaquib Syed, Abhay Sheshadri et al.
ICML 2025spotlight
Why Do Some Language Models Fake Alignment While Others Don't?
Abhay Sheshadri, John Hughes, Julian Michael et al.
NeurIPS 2025spotlightarXiv:2506.18032