LLMs for Multi-File DSL Code Generation: BMW Case Study

Date:

Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study

In the realm of software development, the integration of artificial intelligence has opened new avenues for enhancing productivity and efficiency. A recent study published on arXiv (2604.24678v1) highlights the potential of large language models (LLMs) in generating code for domain-specific languages (DSLs) within an industrial context. This research, conducted at BMW, demonstrates how LLMs can be adapted to produce and modify multi-file DSL artifacts, a task previously deemed complex due to the multi-faceted nature of enterprise applications.

Background

Traditional code generation methods often focus on single-file outputs, leaving a gap in functionalities required for repository-scale changes, which can involve intricate folder structures and inter-file dependencies. The study addresses this gap by leveraging LLMs to automate the generation of DSL code spanning multiple files from simple natural language instructions.

Methodology

The research introduces a comprehensive pipeline that encompasses dataset construction, multi-file task representation, model adaptation, and evaluation. Key elements of the methodology include:

  • Dataset Construction: A structured dataset was created, focusing on the nuances of the Xtext-based DSL utilized by BMW.
  • Multi-File Task Representation: The team encoded DSL folder hierarchies into a structured, path-preserving JSON format, which facilitates the generation of outputs at a repository scale.
  • Model Adaptation: Two instruction-tuned code LLMs, Qwen2.5-Coder and DeepSeek-Coder (7B), were employed under various configurations to evaluate their effectiveness.

Evaluation Criteria

The evaluation of the models was conducted using both standard similarity metrics and task-specific measures. The criteria included:

  • Edit Correctness: Assessing how accurately the generated code reflects the intended modifications.
  • Repository Structural Fidelity: Ensuring the outputs maintain the integrity and organization of the original codebase.

Results

The findings from the study indicate substantial success in utilizing LLMs for multi-file code generation:

  • Fine-tuning the models produced the most significant improvements across all metrics, achieving high exact-match accuracy.
  • Substantial edit similarity was observed, with structural fidelity reaching an impressive score of 1.00 on the held-out set.
  • One-shot in-context learning provided consistent yet smaller gains over baseline prompting, showcasing the robustness of the models.

Practical Implications

To further validate the practical utility of the generated code, an expert developer survey was conducted alongside an execution-based check using existing code generators. The results underscored the effectiveness of LLMs in real-world scenarios, providing developers with a powerful tool to streamline code generation processes.

Conclusion

This industrial case study at BMW marks a significant step forward in the application of LLMs for multi-file DSL code generation. By successfully adapting state-of-the-art models to handle complex repository structures, the research opens up new possibilities for developers, potentially revolutionizing how software is built and maintained in enterprise environments.

Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.