Beyond Functional Correctness: Design Issues in AI IDE-Generated Large-Scale Projects
Summary: arXiv:2604.06373v1 Announce Type: cross
Abstract: New generation of AI coding tools, including AI-powered IDEs equipped with agentic capabilities, can generate code within the context of the project. These AI IDEs are increasingly perceived as capable of producing project-level code at scale. However, there is limited empirical evidence on the extent to which they can generate large-scale software systems and what design issues such systems may exhibit. To address this gap, we conducted a study to explore the capability of Cursor in generating large-scale projects and to evaluate the design quality of projects generated by Cursor.
Research Findings
To systematically guide project generation, we propose a Feature-Driven Human-In-The-Loop (FD-HITL) framework, utilizing curated project descriptions. Our study involved generating 10 projects using Cursor with the FD-HITL framework across three application domains and multiple technologies.
Functional Correctness Assessment
We assessed the functional correctness of these projects through manual evaluation, obtaining an average functional correctness score of 91%. This high score indicates that the generated projects largely meet their intended functional requirements.
Design Quality Evaluation
Subsequently, we analyzed the generated projects using two static analysis tools: CodeScene and SonarQube. This analysis aimed to detect design issues within the projects. The results revealed a significant number of design flaws:
- 1,305 design issues categorized into 9 categories by CodeScene
- 3,193 issues in 11 categories identified by SonarQube
Key Findings
Our findings highlight several important points regarding the capability and limitations of AI-generated code:
- When used with the FD-HITL framework, Cursor can generate functional large-scale projects averaging 16,965 lines of code (LoC) and 114 files.
- Despite achieving high functional correctness, the generated projects exhibit design issues that may pose long-term maintainability and evolvability risks, necessitating careful review by experienced developers.
- The most prevalent design issues identified include:
- Code Duplication
- High Code Complexity
- Large Methods
- Framework Best-Practice Violations
- Exception-Handling Issues
- Accessibility Issues
- These design issues violate fundamental design principles such as the Single Responsibility Principle (SRP), Separation of Concerns (SoC), and the Don’t Repeat Yourself (DRY) principle.
Conclusion
The study underscores the potential of AI coding tools like Cursor in generating functional large-scale software projects. However, it also emphasizes the critical need for meticulous design oversight. As AI technologies continue to evolve, addressing these design issues will be essential for ensuring the long-term viability and maintainability of AI-generated code.
The replication package for this study can be accessed at https://github.com/Kashifraz/DIinAGP.
