On the Origins of Linear Representations in Large Language Models

0citations

PDF

Citations

#10

in ICML 2024

of 2635 papers

Authors

Data Points

Authors

Yibo Jiang Goutham Rajendran Pradeep Ravikumar Bryon Aragam Victor Veitch

Topics

linear representations latent variable model next token prediction loss function analysis gradient descent bias semantic concept encoding representation space geometry

Abstract

An array of recent works have argued that high-level semantic concepts are encoded "linearly" in the representation space of large language models. In this work, we study the origins of such linear representations. To that end, we introduce a latent variable model to abstract and formalize the concept dynamics of the next token prediction. We use this formalism to prove that linearity arises as a consequence of the loss function and the implicit bias of gradient descent. The theory is further substantiated empirically via experiments.

Citation History

Jan 28, 2026