Yonatan Belinkov
7
Papers
567
Total Citations
1
Affiliations
Affiliations
Technion - Israel Institute of Technology
Papers (7)
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
ICLR 2025
252
citations
Linearity of Relation Decoding in Transformer Language Models
ICLR 2024
140
citations
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
ICLR 2024
97
citations
Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics
ICLR 2025
63
citations
MIB: A Mechanistic Interpretability Benchmark
ICML 2025
9
citations
Accelerating the Global Aggregation of Local Explanations
AAAI 2024arXiv
6
citations
Unsupervised Translation of Emergent Communication
AAAI 2025
0
citations