When do neural networks learn world models?

0citations

arXiv:2502.09297

Citations

#746

in ICML 2025

of 3340 papers

Authors

Data Points

Authors

Tianren Zhang Guanyu Chen Feng Chen

Abstract

Humans developworld modelsthat capture the underlying generation process of data. Whether neural networks can learn similar world models remains an open problem. In this work, we present the first theoretical results for this problem, showing that in amulti-tasksetting, models with alow-degree biasprovably recover latent data-generating variables under mild assumptions--even if proxy tasks involve complex, non-linear functions of the latents. However, such recovery is sensitive to model architecture. Our analysis leverages Boolean models of task solutions via the Fourier-Walsh transform and introduces new techniques for analyzing invertible Boolean transforms, which may be of independent interest. We illustrate the algorithmic implications of our results and connect them to related research areas, including self-supervised learning, out-of-distribution generalization, and the linear representation hypothesis in large language models.

Citation History

Jan 28, 2026