Toward Executable Repository-Level Code Generation via Environment Alignment
Source: arXiv:2604.03622v1
Type: Cross
Abstract
Large language models (LLMs) have achieved strong performance on code generation, but existing methods still struggle with repository-level code generation under executable validation. Under this evaluation setting, success is determined not by the plausibility of isolated code fragments, but by whether a generated multi-file repository can be successfully installed, have its dependencies and internal references resolved, be launched, and be validated in a real execution environment.
Introduction
In recent years, the capabilities of large language models have transformed the landscape of programming and software development. However, the challenge of generating executable code at the repository level remains a significant hurdle. Current approaches often fail to ensure that generated code meets the necessary criteria for successful execution.
The EnvGraph Framework
To tackle the complexities associated with repository-level code generation, we introduce EnvGraph, a novel framework that addresses executability as an environment alignment problem. The core idea of EnvGraph is to model two essential conditions that must be satisfied for successful repository execution:
- External Dependency Satisfaction: Ensuring that all external libraries and packages required by the repository are available and correctly configured.
- Repository-Internal Reference Resolution: Guaranteeing that all internal references within the codebase are correctly linked and accessible.
Methodology
EnvGraph employs a dual-layer environment representation that encapsulates both the external environment and the internal structure of the repository. This framework utilizes execution evidence to enhance code generation processes through:
- Execution-Evidence-Based Attribution: Analyzing successful execution patterns to inform the generation of new code.
- Unified Targeted Revision Mechanism: A systematic approach that iteratively refines the generated code by aligning it with the identified execution evidence.
Evaluation and Results
We conducted extensive experiments to evaluate the efficacy of EnvGraph in repository-level code generation. The framework was tested with three representative backbone LLMs, and its performance was compared against other environment-aware and repository-level baselines. The results were promising, showcasing that:
- EnvGraph consistently achieved superior performance across various repository-level benchmarks.
- It outperformed the strongest non-EnvGraph baseline by an absolute margin of 5.72–5.87 percentage points in Functional Correctness.
- It also exceeded baseline performance by 4.58–8.66 percentage points in Non-Functional Quality.
Conclusion
EnvGraph represents a significant advancement in the field of repository-level code generation. By addressing the alignment of execution environments and enhancing the generation process through evidence-based methodologies, this framework paves the way for more robust and reliable software development practices. Future work will focus on further refining the model and exploring its applications across diverse programming domains.
