WybeCoder: Verified Imperative Code Generation
Summary: arXiv:2603.29088v1 Announce Type: cross
Abstract
Recent progress in large language models (LLMs) has advanced automatic code generation and formal theorem proving, yet software verification has not seen the same improvement. To address this gap, we propose WybeCoder, an agentic code verification framework that enables prove-as-you-generate development where code, invariants, and proofs co-evolve. It builds on a recent framework that combines automatic verification condition generation and SMT solvers with interactive proofs in Lean.
Introduction to WybeCoder
With the rapid advancements in artificial intelligence, particularly in the realm of programming and verification, it is crucial to develop systems that ensure the correctness of code as it is being generated. WybeCoder stands out as an innovative approach to this challenge, integrating the principles of code synthesis with formal verification techniques.
Key Features of WybeCoder
WybeCoder introduces several groundbreaking features that enhance the process of code verification:
- Prove-as-You-Generate Development: The framework allows for simultaneous generation of code, invariants, and proofs, ensuring that all aspects of software correctness are addressed in tandem.
- Integration with Lean: By leveraging Lean’s interactive proof capabilities, WybeCoder enhances the verification process with a robust automated foundation.
- Benchmark Translation: The system systematically translates two benchmarks for functional verification, Verina and Clever, into equivalent imperative code specifications, facilitating a comparative analysis.
- Scalable Performance: WybeCoder’s architecture is designed for scalability, demonstrating consistent performance improvements with complex algorithms, such as Heapsort.
Performance Evaluation
To evaluate the effectiveness of WybeCoder, extensive testing was conducted on a variety of complex algorithms. The results were promising:
- For Heapsort, the framework demonstrated significant performance enhancements, generating hundreds of lines of verified code.
- The system successfully synthesized dozens of valid invariants and managed the dispatching of numerous subgoals, leading to an overall improvement in verification efficiency.
- WybeCoder achieved a success rate of 74% on Verina tasks and 62% on Clever tasks, marking a significant advancement over previous methodologies.
Conclusion and Future Directions
The introduction of WybeCoder represents a significant leap forward in the field of software verification. By bridging the gap between code generation and formal verification, it opens up new avenues for automated construction of large-scale datasets of verified imperative code. Future research will focus on further enhancing its capabilities and exploring its applications across diverse programming languages and paradigms.
As the landscape of software development continues to evolve, frameworks like WybeCoder will be instrumental in ensuring that software not only functions correctly but is also formally verified, paving the way for more reliable and efficient coding practices in the industry.
