NEURIPS 2025 "linear representation hypothesis" Papers
2 papers found
LLM Unlearning via Neural Activation Redirection
William Shen, Xinchi Qiu, Meghdad Kurmanji et al.
NEURIPS 2025posterarXiv:2502.07218
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter, Julian Minder, Thomas Hofmann et al.
NEURIPS 2025spotlightarXiv:2507.08802
9
citations