Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models
Abstract
Many recent studies have found evidence for emergent reasoning capabilities in large language models (LLMs), but debate persists concerning the robustness of these capabilities, and the extent to which they depend on structured reasoning mechanisms. To shed light on these issues, we study the internal mechanisms that support abstract reasoning in LLMs. We identify an emergent symbolic architecture that implements abstract reasoning via a series of three computations. In early layers,symbol abstraction headsconvert input tokens to abstract variables based on the relations between those tokens. In intermediate layers,symbolic induction headsperform sequence induction over these abstract variables. Finally, in later layers,retrieval headspredict the next token by retrieving the value associated with the predicted abstract variable. These results point toward a resolution of the longstanding debate between symbolic and neural network approaches, suggesting that emergent reasoning in neural networks depends on the emergence of symbolic mechanisms.