AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units
Summary: arXiv:2601.07160v2 Announce Type: replace
Abstract: To meet the ever-increasing demand for computational efficiency, Neural Processing Units (NPUs) have become critical in modern AI infrastructure. However, unlocking their full potential requires developing high-performance compute kernels using vendor-specific Domain-Specific Languages (DSLs), a task that demands deep hardware expertise and is labor-intensive. While Large Language Models (LLMs) have shown promise in general code generation, they struggle with the strict constraints and scarcity of training data in the NPU domain.
Our preliminary study reveals that state-of-the-art general-purpose LLMs fail to generate functional complex kernels for Ascend NPUs, yielding a near-zero success rate. To address these challenges, we propose AscendKernelGen, a generation-evaluation integrated framework for NPU kernel development.
Key Components of AscendKernelGen
- Ascend-CoT: A high-quality dataset that incorporates chain-of-thought reasoning derived from real-world kernel implementations.
- KernelGen-LM: A domain-adaptive model trained via supervised fine-tuning and reinforcement learning with execution feedback.
- NPUKernelBench: A comprehensive benchmark for assessing compilation, correctness, and performance across varying complexity levels.
Experimental Results
Experimental results demonstrate that our approach significantly bridges the gap between general LLMs and hardware-specific coding. Specifically, the compilation success rate on complex Level-2 kernels improves from 0% to 95.5% (Pass@10), while functional correctness achieves 64.3% compared to the baseline’s complete failure.
The Importance of Domain-Specific Reasoning
These results highlight the critical role of domain-specific reasoning and rigorous evaluation in automating accelerator-aware code generation. AscendKernelGen not only addresses the inherent limitations of general-purpose LLMs but also sets a new standard for kernel generation in the NPU realm.
Availability
AscendKernelGen is available at the following links:
In conclusion, AscendKernelGen represents a significant advancement in the field of AI infrastructure, promising to enhance the efficiency and effectiveness of NPU kernel generation through innovative machine learning techniques.
