Aaron Mueller

4

Papers

504

Total Citations

Papers (4)

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Inverse Scaling: When Bigger Isn't Better

Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics

MIB: A Mechanistic Interpretability Benchmark