Decomposing and Editing Predictions by Modeling Model Computation

0citations
PDFProject
0
Citations
#10
in ICML 2024
of 2635 papers
3
Authors
1
Data Points

Abstract

How does the internal computation of a machine learning model transform inputs into predictions?To tackle this question, we introduce a framework calledcomponent modelingfor decomposing a model prediction in terms of its components---architectural "building blocks" such as convolution filters or attention heads. We focus on a special case of this framework,component attribution, where the goal is to estimate the counterfactual impact of individual components on a given prediction. We then present COAR, a scalable algorithm for estimating component attributions, and demonstrate its effectiveness across models, datasets and modalities. Finally, we show that COAR directly enables effective model editing. Our code is available atgithub.com/MadryLab/modelcomponents.

Citation History

Jan 28, 2026
0