LLMs for Multi-File DSL Code Generation: BMW Case Study

Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study

In the realm of software development, the integration of artificial intelligence has opened new avenues for enhancing productivity and efficiency. A recent study published on arXiv (2604.24678v1) highlights the potential of large language models (LLMs) in generating code for domain-specific languages (DSLs) within an industrial context. This research, conducted at BMW, demonstrates how LLMs can be adapted to produce and modify multi-file DSL artifacts, a task previously deemed complex due to the multi-faceted nature of enterprise applications.

Background

Traditional code generation methods often focus on single-file outputs, leaving a gap in functionalities required for repository-scale changes, which can involve intricate folder structures and inter-file dependencies. The study addresses this gap by leveraging LLMs to automate the generation of DSL code spanning multiple files from simple natural language instructions.

Methodology

The research introduces a comprehensive pipeline that encompasses dataset construction, multi-file task representation, model adaptation, and evaluation. Key elements of the methodology include:

Dataset Construction: A structured dataset was created, focusing on the nuances of the Xtext-based DSL utilized by BMW.
Multi-File Task Representation: The team encoded DSL folder hierarchies into a structured, path-preserving JSON format, which facilitates the generation of outputs at a repository scale.
Model Adaptation: Two instruction-tuned code LLMs, Qwen2.5-Coder and DeepSeek-Coder (7B), were employed under various configurations to evaluate their effectiveness.

Evaluation Criteria

The evaluation of the models was conducted using both standard similarity metrics and task-specific measures. The criteria included:

Edit Correctness: Assessing how accurately the generated code reflects the intended modifications.
Repository Structural Fidelity: Ensuring the outputs maintain the integrity and organization of the original codebase.

Results

The findings from the study indicate substantial success in utilizing LLMs for multi-file code generation:

Fine-tuning the models produced the most significant improvements across all metrics, achieving high exact-match accuracy.
Substantial edit similarity was observed, with structural fidelity reaching an impressive score of 1.00 on the held-out set.
One-shot in-context learning provided consistent yet smaller gains over baseline prompting, showcasing the robustness of the models.

Practical Implications

To further validate the practical utility of the generated code, an expert developer survey was conducted alongside an execution-based check using existing code generators. The results underscored the effectiveness of LLMs in real-world scenarios, providing developers with a powerful tool to streamline code generation processes.

Conclusion

This industrial case study at BMW marks a significant step forward in the application of LLMs for multi-file DSL code generation. By successfully adapting state-of-the-art models to handle complex repository structures, the research opens up new possibilities for developers, potentially revolutionizing how software is built and maintained in enterprise environments.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

LLMs for Multi-File DSL Code Generation: BMW Case Study

Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study

Background

Methodology

Evaluation Criteria

Results

Practical Implications

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related