Atticus Geiger

5

Papers

109

Total Citations

Papers (5)

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

MIB: A Mechanistic Interpretability Benchmark

Causal Abstractions of Neural Networks

CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca