Design Challenges in AI IDE-Generated Large-Scale Code

Beyond Functional Correctness: Design Issues in AI IDE-Generated Large-Scale Projects

Summary: arXiv:2604.06373v1 Announce Type: cross

Abstract: New generation of AI coding tools, including AI-powered IDEs equipped with agentic capabilities, can generate code within the context of the project. These AI IDEs are increasingly perceived as capable of producing project-level code at scale. However, there is limited empirical evidence on the extent to which they can generate large-scale software systems and what design issues such systems may exhibit. To address this gap, we conducted a study to explore the capability of Cursor in generating large-scale projects and to evaluate the design quality of projects generated by Cursor.

Research Findings

To systematically guide project generation, we propose a Feature-Driven Human-In-The-Loop (FD-HITL) framework, utilizing curated project descriptions. Our study involved generating 10 projects using Cursor with the FD-HITL framework across three application domains and multiple technologies.

Functional Correctness Assessment

We assessed the functional correctness of these projects through manual evaluation, obtaining an average functional correctness score of 91%. This high score indicates that the generated projects largely meet their intended functional requirements.

Design Quality Evaluation

Subsequently, we analyzed the generated projects using two static analysis tools: CodeScene and SonarQube. This analysis aimed to detect design issues within the projects. The results revealed a significant number of design flaws:

1,305 design issues categorized into 9 categories by CodeScene
3,193 issues in 11 categories identified by SonarQube

Key Findings

Our findings highlight several important points regarding the capability and limitations of AI-generated code:

When used with the FD-HITL framework, Cursor can generate functional large-scale projects averaging 16,965 lines of code (LoC) and 114 files.
Despite achieving high functional correctness, the generated projects exhibit design issues that may pose long-term maintainability and evolvability risks, necessitating careful review by experienced developers.
The most prevalent design issues identified include:

Code Duplication
High Code Complexity
Large Methods
Framework Best-Practice Violations
Exception-Handling Issues
Accessibility Issues

These design issues violate fundamental design principles such as the Single Responsibility Principle (SRP), Separation of Concerns (SoC), and the Don’t Repeat Yourself (DRY) principle.

Conclusion

The study underscores the potential of AI coding tools like Cursor in generating functional large-scale software projects. However, it also emphasizes the critical need for meticulous design oversight. As AI technologies continue to evolve, addressing these design issues will be essential for ensuring the long-term viability and maintainability of AI-generated code.

The replication package for this study can be accessed at https://github.com/Kashifraz/DIinAGP.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Design Challenges in AI IDE-Generated Large-Scale Code

Beyond Functional Correctness: Design Issues in AI IDE-Generated Large-Scale Projects

Research Findings

Functional Correctness Assessment

Design Quality Evaluation

Key Findings

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related