DVM: Fast Real-Time Kernel Generation for AI Models

DVM: Real-Time Kernel Generation for Dynamic AI Models

Summary: arXiv:2603.24239v1 Announce Type: cross

Abstract: Dynamism is common in AI computation, e.g., the dynamic tensor shapes and the dynamic control flows in models. Due to the long compilation time, existing runtime compilation damages the model efficiency, while the offline compilers either suffer from the long compilation time and device memory footprint to cover all the possible execution instances of a dynamic model, or sacrifice optimization opportunities for usability.

In this paper, we rethink the feasibility of runtime compilation for dynamic models and identify that the key for it to work is to speed up the compilation or hide the compilation overhead. To do this, we propose a real-time compiler, DVM.

Overview of DVM

DVM, or Dynamic Virtual Machine, presents a novel approach to handling the complexities of dynamic AI models. The traditional challenges associated with runtime compilation have necessitated a reevaluation of existing methodologies. Here are key features of DVM:

Runtime Operator Compiler: DVM is built on a runtime operator compiler which utilizes a bytecode virtual machine. This allows for effective and efficient compilation for each dynamic operator instance based on its input.
Bytecode Encoding: Instead of compiling programs directly into machine code, DVM encodes operator programs into bytecode on the CPU. This bytecode is then decoded into virtual instructions that can be executed directly on the NPU (Neural Processing Unit).
Operator Fusion: DVM introduces an operator fuser that enhances performance through symbol-deduction-based fusion on static graphs and runtime fusion on dynamic graphs. This dual approach increases the number of fusion opportunities available.

Performance Evaluation

The effectiveness of DVM has been rigorously evaluated against existing frameworks such as TorchInductor, PyTorch-eager, and MindSpore-graph-O0. The results are promising:

DVM demonstrates an impressive improvement, achieving up to 11.77 times better operator/model efficiency.
In terms of maximum compilation time, DVM is up to five orders of magnitude faster than its competitors.

Conclusion

The introduction of DVM marks a significant advancement in the field of AI model compilation. By addressing the inefficiencies of traditional runtime compilation and offering a robust solution for dynamic models, DVM not only enhances performance but also opens new avenues for research and application within AI. The ability to manage dynamic tensor shapes and control flows more efficiently is crucial as AI continues to evolve. As such, DVM sets a new standard for real-time kernel generation, pushing the boundaries of what is possible in AI computation.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

DVM: Fast Real-Time Kernel Generation for AI Models

DVM: Real-Time Kernel Generation for Dynamic AI Models

Overview of DVM

Performance Evaluation

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related