Automated ACSL Annotation Evaluation for Formal Verification

Date:

Evaluating LLM-Generated ACSL Annotations for Formal Verification

Summary: arXiv:2602.13851v2 Announce Type: replace-cross

Abstract: Formal specifications are crucial for building verifiable and dependable software systems, yet generating accurate and verifiable specifications for real-world C programs remains challenging. This paper empirically evaluates the extent to which formal-analysis tools can automatically generate and verify ACSL specifications without human or learning-based assistance.

Introduction

The demand for high-quality software systems has never been greater, particularly in safety-critical domains such as healthcare, finance, and transportation. As software complexity continues to rise, so does the need for rigorous methods to ensure its reliability. Formal specifications serve as a foundation for building verifiable software systems, yet the task of generating these specifications, especially for real-world C programs, presents significant challenges.

Methodology

This paper presents a controlled study aimed at evaluating the performance of various tools in generating ACSL (ANSI/ISO C Specification Language) annotations. We utilized a recently released dataset containing 506 C programs, transitioning from interactive, developer-driven workflows to an automated evaluation setting.

  • Five ACSL generation systems were analyzed:
    • A rule-based Python script
    • Frama-C’s RTE plugin
    • DeepSeek-V3.2, a large language model
    • GPT-5.2, another prominent model
    • OLMo 3.1 32B Instruct, a third language model

Evaluation Process

All generated ACSL specifications were verified under controlled conditions using the Frama-C WP plugin, which is powered by multiple SMT (Satisfiability Modulo Theories) solvers. This setup allowed for a direct comparison of several factors:

  • Annotation Quality: Assessing the correctness and completeness of generated specifications.
  • Solver Sensitivity: Evaluating how different solvers reacted to the generated annotations.
  • Proof Stability: Analyzing the consistency of verification results across multiple runs.

Results

The findings from this study provide new empirical evidence on the capabilities and limitations of automated ACSL generation systems. While some models demonstrated promising results, others struggled to produce accurate annotations. The study highlights the importance of understanding the trade-offs between automated generation methods and human expertise in software verification.

Conclusion

This research contributes to the growing body of literature on formal verification and automated specification generation. By empirically evaluating the performance of various tools, we aim to enhance the understanding of their effectiveness and limitations. The insights gleaned from this study will be invaluable for researchers and practitioners seeking to improve the reliability of software systems through effective specification generation.

Future Work

Further research is needed to optimize the performance of automated ACSL generation systems. Exploring hybrid approaches that combine human expertise with machine-generated annotations could yield better results and pave the way for more robust software verification processes.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.