Once4All: Skeleton-Guided SMT Solver Fuzzing with LLM-Synthesized Generators
In an era where software correctness is paramount, the reliability of Satisfiability Modulo Theory (SMT) solvers has become increasingly critical. These solvers underpin a variety of modern systems and programming languages, facilitating essential tasks such as symbolic execution and automated verification. A recent paper, arXiv:2508.20340v4, introduces a groundbreaking framework called Once4All, which innovatively combines Large Language Models (LLMs) with fuzzing techniques to enhance the testing of SMT solvers.
Challenges in SMT Solver Testing
Despite the importance of SMT solvers, traditional testing methods have encountered significant challenges. As SMT solvers evolve rapidly, prior testing techniques have struggled to keep pace, particularly in generating high-quality test formulas. The paper identifies two primary obstacles:
- Nearly half of the generated formulas from previous methods are syntactically invalid.
- Iterative interactions with LLMs introduce substantial computational overhead, making the fuzzing process inefficient.
Introducing Once4All
Once4All addresses these challenges by shifting the focus from direct formula generation to the synthesis of reusable generators for logical expressions. The framework employs LLMs in two significant ways:
- Automatic Extraction of Context-Free Grammars (CFGs): Once4All extracts CFGs for SMT theories, including specific extensions tailored to various solvers, directly from documentation. This process ensures that the generated formulas adhere to the syntactic rules defined for each theory.
- Composable Boolean Term Generators: The framework synthesizes generators that produce Boolean terms compliant with the extracted grammars. This allows for the iterative generation of terms that can be seamlessly integrated into existing formula structures.
Fuzzing Process and Advantages
During the fuzzing process, Once4All utilizes structural skeletons derived from existing valid formulas. These skeletons are then populated with terms produced by the LLM-synthesized generators. This design not only guarantees syntactic validity but also enhances semantic diversity, leading to more comprehensive testing.
A significant advantage of Once4All is its efficiency. The framework requires only a single interaction with the LLM to extract CFGs and synthesize term generators, which dramatically reduces runtime costs compared to traditional methods that rely on iterative interactions.
Evaluation and Results
Once4All was rigorously evaluated against two leading SMT solvers: Z3 and cvc5. The results were remarkable, with the framework identifying a total of 43 confirmed bugs. Notably, 40 of these bugs have already been addressed and fixed by the developers, demonstrating the practical impact of Once4All on enhancing SMT solver reliability.
Conclusion
The introduction of Once4All marks a significant advancement in SMT solver testing methodologies. By leveraging the strengths of Large Language Models and focusing on the synthesis of reusable generators, this framework not only overcomes existing challenges but also sets a new standard for testing accuracy and efficiency in the field of formal verification.
