AscendKernelGen: LLM-Based Kernel Generation for NPUs

Date:


AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units

Summary: arXiv:2601.07160v2 Announce Type: replace

Abstract: To meet the ever-increasing demand for computational efficiency, Neural Processing Units (NPUs) have become critical in modern AI infrastructure. However, unlocking their full potential requires developing high-performance compute kernels using vendor-specific Domain-Specific Languages (DSLs), a task that demands deep hardware expertise and is labor-intensive. While Large Language Models (LLMs) have shown promise in general code generation, they struggle with the strict constraints and scarcity of training data in the NPU domain.

Our preliminary study reveals that state-of-the-art general-purpose LLMs fail to generate functional complex kernels for Ascend NPUs, yielding a near-zero success rate. To address these challenges, we propose AscendKernelGen, a generation-evaluation integrated framework for NPU kernel development.

Key Components of AscendKernelGen

  • Ascend-CoT: A high-quality dataset that incorporates chain-of-thought reasoning derived from real-world kernel implementations.
  • KernelGen-LM: A domain-adaptive model trained via supervised fine-tuning and reinforcement learning with execution feedback.
  • NPUKernelBench: A comprehensive benchmark for assessing compilation, correctness, and performance across varying complexity levels.

Experimental Results

Experimental results demonstrate that our approach significantly bridges the gap between general LLMs and hardware-specific coding. Specifically, the compilation success rate on complex Level-2 kernels improves from 0% to 95.5% (Pass@10), while functional correctness achieves 64.3% compared to the baseline’s complete failure.

The Importance of Domain-Specific Reasoning

These results highlight the critical role of domain-specific reasoning and rigorous evaluation in automating accelerator-aware code generation. AscendKernelGen not only addresses the inherent limitations of general-purpose LLMs but also sets a new standard for kernel generation in the NPU realm.

Availability

AscendKernelGen is available at the following links:

In conclusion, AscendKernelGen represents a significant advancement in the field of AI infrastructure, promising to enhance the efficiency and effectiveness of NPU kernel generation through innovative machine learning techniques.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.