Yonatan Belinkov
13
Papers
659
Total Citations
1
Affiliations
Affiliations
Technion - Israel Institute of Technology
Papers (13)
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
ICLR 2025
252
citations
Linearity of Relation Decoding in Transformer Language Models
ICLR 2024
140
citations
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
ICLR 2024
97
citations
Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
NeurIPS 2017arXiv
92
citations
Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics
ICLR 2025
63
citations
MIB: A Mechanistic Interpretability Benchmark
ICML 2025
9
citations
Accelerating the Global Aggregation of Local Explanations
AAAI 2024arXiv
6
citations
Editing Implicit Assumptions in Text-to-Image Diffusion Models
ICCV 2023arXiv
0
citations
Unsupervised Translation of Emergent Communication
AAAI 2025
0
citations
Investigating Gender Bias in Language Models Using Causal Mediation Analysis
NeurIPS 2020
0
citations
IRM—when it works and when it doesn't: A test case of natural language inference
NeurIPS 2021
0
citations
Locating and Editing Factual Associations in GPT
NeurIPS 2022
0
citations
Measures of Information Reflect Memorization Patterns
NeurIPS 2022
0
citations