Sliceformer: Advanced Static Program Slicing with Language Models

Static Program Slicing Using Language Models With Dataflow-Aware Pretraining and Constrained Decoding

Static program slicing is a critical technique in software engineering that allows developers to isolate code segments relevant to specific variables. This process is particularly important for debugging, maintaining, and understanding large codebases. Recent advancements in machine learning, particularly with language models (LMs), have shown promise in automating the prediction of code slices. However, existing approaches often struggle with accurately modeling dependencies and generating precise outputs, leading to issues such as hallucinated tokens and irrelevant statements.

To tackle these challenges, researchers have introduced Sliceformer, an innovative framework that reformulates static program slicing as a sequence-to-sequence task. This methodology utilizes small language models, including CodeT5+, and introduces two significant innovations aimed at enhancing the performance of program slicing tasks.

Innovations in Sliceformer

Dataflow-Aware Pretraining: Sliceformer employs specialized pretraining objectives designed to improve dependency modeling. By leveraging data flow graphs (DFGs), this approach teaches models about data dependencies through methods such as dataflow-preserving statement permutation and dataflow-aware span corruption. This ensures that the model better understands how data flows through the code, ultimately leading to more accurate slice predictions.
Constrained Decoding Mechanism: To prevent the model from generating hallucinated outputs, Sliceformer integrates a constrained decoding mechanism. This mechanism enforces both lexical and syntactic constraints during the generation process, significantly reducing the likelihood of producing irrelevant or nonsensical code segments.

Evaluation and Results

To assess the effectiveness of Sliceformer, extensive evaluations were conducted on benchmark datasets for Java and Python program slicing. The results demonstrated that Sliceformer consistently outperformed state-of-the-art baselines, achieving up to a 22% improvement in Exact Match scores. These metrics highlight the framework’s capability to produce more accurate and relevant code slices, thus showcasing its potential to enhance software development practices.

The advancements realized through Sliceformer signify a substantial leap in the application of language models to software engineering tasks. By addressing the critical issues of dependency modeling and hallucination, this approach not only improves the accuracy of program slicing but also sets the stage for future innovations in automated software analysis.

Conclusion

As the field of artificial intelligence continues to evolve, the integration of language models into traditional software engineering methodologies presents exciting opportunities. Sliceformer exemplifies how combining innovative pretraining strategies with robust decoding mechanisms can lead to significant advancements in automating complex tasks. The implications of such research extend beyond program slicing, potentially influencing various areas within software development and maintenance.

Researchers and practitioners are encouraged to explore the capabilities of Sliceformer and consider its application in their own projects to enhance productivity and code quality.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Sliceformer: Advanced Static Program Slicing with Language Models

Static Program Slicing Using Language Models With Dataflow-Aware Pretraining and Constrained Decoding

Innovations in Sliceformer

Evaluation and Results

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related