"inner interpretability" Papers
2 papers found
The Computational Complexity of Circuit Discovery for Inner Interpretability
Federico Adolfi, Martina G. Vilas, Todd Wareham
ICLR 2025posterarXiv:2410.08025
11
citations
Position: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience
Martina G. Vilas, Federico Adolfi, David Poeppel et al.
ICML 2024poster