2025 Spotlight "mechanistic interpretability" Papers
2 papers found
A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning
Guan Zhe Hong, Nishanth Dikkala, Enming Luo et al.
NEURIPS 2025spotlightarXiv:2411.04105
3
citations
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter, Julian Minder, Thomas Hofmann et al.
NEURIPS 2025spotlightarXiv:2507.08802
9
citations