Tree-Based Premise Selection for Lean4

0citations

Project

Citations

#2219

in NeurIPS 2025

of 5858 papers

Authors

Data Points

Authors

Zichen Wang Anjie Dong Zaiwen Wen

Topics

premise selection expression tree structure weisfeiler-lehman kernel tree edit distance theorem proving structure-aware similarity multi-stage filtering cluster-based optimization

Abstract

Premise selection is a critical bottleneck in interactive theorem proving, particularly with large libraries. Existing methods, primarily relying on semantic embeddings, often fail to effectively leverage the rich structural information inherent in mathematical expressions. This paper proposes a novel framework for premise selection based on the structure of expression trees. The framework enhances premise selection ability by explicitly utilizing the structural information of Lean expressions and by means of the simplified tree representation obtained via common subexpression elimination. Our method employs a multi-stage filtering pipeline, incorporating structure-aware similarity measures including the Weisfeiler-Lehman kernel, tree edit distance, $\texttt{Const}$ node Jaccard similarity, and collapse-match similarity. An adaptive fusion strategy combines these metrics for refined ranking. To handle large-scale data efficiently, we incorporate cluster-based search space optimization and structural compatibility constraints. Comprehensive evaluation on a large theorem library extracted from Mathlib4 demonstrates that our method significantly outperforms existing premise retrieval tools across various metrics. Experimental analysis, including ablation studies and parameter sensitivity analysis, validates the contribution of individual components and highlights the efficacy of our structure-aware approach and multi-metric fusion.

Citation History

Jan 26, 2026

Jan 27, 2026

Feb 1, 2026