LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving
In the rapidly evolving field of formal verification and automated theorem proving, the Lean 4 theorem prover has emerged as a powerful tool for mathematicians and computer scientists alike. A significant challenge within this domain is the process of proving theorems, which often necessitates identifying a diverse set of library lemmas whose joint utilization results in a succinct proof. This complex task is referred to as global premise retrieval. The recent introduction of LeanSearch v2 aims to address this challenge, offering an innovative solution that surpasses existing tools.
Understanding Global Premise Retrieval
Global premise retrieval is a nuanced problem that remains largely unaddressed by conventional tools. Existing semantic search engines are designed to locate individual declarations that match specific queries, while premise-selection systems focus on predicting useful lemmas one tactical step at a time. However, these approaches fall short when it comes to recovering the complete set of premises required for an entire theorem. LeanSearch v2 seeks to bridge this gap with its two-mode retrieval system.
Features of LeanSearch v2
LeanSearch v2 introduces two distinct modes of operation, each tailored to enhance the theorem proving experience:
- Standard Mode: This mode utilizes a hierarchy-informalized Mathlib corpus combined with an embedding-reranker pipeline. It achieves state-of-the-art single-query retrieval capabilities without the need for domain-specific fine-tuning. In benchmark tests, it demonstrated an impressive normalized Discounted Cumulative Gain (nDCG@10) score of 0.62, outpacing the next-best system, which achieved a score of 0.53.
- Reasoning Mode: Building upon the standard mode, the reasoning mode targets global premise retrieval through iterative sketch-retrieve-reflect cycles. This innovative approach allows users to recover a substantial portion of the required premise groups, further enhancing the theorem proving process.
Performance Metrics
In rigorous evaluations using a 69-query benchmark of research-level theorems from Mathlib, LeanSearch v2’s reasoning mode succeeded in recovering 46.1% of ground-truth premise groups within the top 10 retrieved candidates. This performance significantly outstrips that of strong reasoning retrieval systems, which achieved 38.0%, and traditional premise-selection baselines, which managed only 9.3% on the same benchmark.
Downstream Evaluation and Impact
A controlled downstream evaluation involving a fixed prover loop further underscored the effectiveness of LeanSearch v2. By replacing alternative retrieval systems with LeanSearch v2, the highest proof success rate was recorded at 20%. This result not only surpasses the next-best system’s success rate of 16% but also highlights the stark contrast with scenarios lacking retrieval capabilities, which yielded a mere 4% success rate. These findings confirm that the quality of retrieval directly influences proof generation outcomes.
Open Source and Accessibility
In a move towards fostering collaboration and further development in the field, the developers of LeanSearch v2 have open-sourced all relevant code, data, and benchmarks. Interested users can access the code and data at GitHub. Furthermore, the standard mode is publicly accessible with API access at LeanSearch.net.
As Lean 4 continues to gain traction, tools like LeanSearch v2 represent a significant advancement in the quest for efficient and effective theorem proving, paving the way for future innovations in formal verification methodologies.
Related AI Insights
- Watermarking as a Core AI Monitoring Primitive
- Vividh-ASR: Robust Indic Speech Recognition Benchmark
- Enhancing Reinforcement Learning with Contrastive Rewards
- AdaFocus: Efficient Long Video Understanding with Adaptive Sampling
- Accelerating Masked Diffusion Language Model Training
- Cables and Adapters Worth Keeping: Why Save Them
- Why Alignment Alone Fails in Multi-Agent AI Sycophancy
- Muon Optimizer: Orthogonalization Boosts Learning Rate & Convergence
- Boost LLMs with Context Training & Active Info Seeking
- CoRe-Gen: Accurate Spectrum-to-Structure AI with Noisy Data
