Small-to-Large Generalization: Training Data Influences Models Consistently Across Scale

0citations
0
Citations
#2434
in ICLR 2025
of 3827 papers
3
Authors
4
Data Points

Abstract

Choice of training data distribution greatly influences model behavior. Yet, inlarge-scale settings, precisely characterizinghowchanges in trainingdata affects predictions is often difficult due to model training costs. Currentpractice is to instead extrapolate from scaled down, inexpensive-to-train proxymodels. However, changes in data do not influence smaller and larger modelsidentically. Therefore, understanding how choice of data affects large-scalemodels raises the question: how does training data distribution influence modelbehavior across compute scale? We find that small- and large-scale languagemodel predictions (generally)dohighly correlate across choice oftraining data. Equipped with these findings, we characterize how proxy scaleaffects effectiveness in two downstream proxy model applications: dataattribution and dataset selection.

Citation History

Jan 25, 2026
0
Jan 26, 2026
0
Jan 26, 2026
0
Jan 28, 2026
0