Atticus Geiger

5

Papers

109

Total Citations

Papers (5)

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

MIB: A Mechanistic Interpretability Benchmark

Causal Abstractions of Neural Networks

NeurIPS 2021arXiv

CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior

NeurIPS 2022arXiv

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca

NeurIPS 2023arXiv