FastCache: Accelerate Diffusion Transformer with Caching

FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

arXiv:2505.20353v3

Type: replace-cross

Abstract

Diffusion Transformers (DiT) have emerged as powerful generative models, offering impressive results in various applications. However, their computational intensity poses significant challenges. The iterative nature of these models, coupled with their deep transformer stacks, leads to high resource consumption during inference.

To address these inefficiencies, we introduce FastCache, a hidden-state-level caching and compression framework. FastCache is designed to accelerate DiT inference by leveraging redundancy present within the internal representations of the model.

Key Features of FastCache

FastCache employs a dual strategy to enhance performance:

Spatial-aware Token Selection: This mechanism adaptively filters redundant tokens based on hidden-state saliency, ensuring that only the most relevant information is processed.
Transformer-level Cache: By reusing latent activations across timesteps, FastCache minimizes unnecessary computations when changes in the data fall below a predefined threshold.

Performance and Evaluation

The combination of these modules not only reduces computational demands but also preserves the fidelity of generated outputs. Theoretical analyses reveal that FastCache maintains a bounded approximation error through a hypothesis-testing-based decision rule. This means that even with accelerated processing, the quality of the generative outputs remains high.

Empirical evaluations conducted across multiple variants of DiT have demonstrated that FastCache leads to substantial reductions in both latency and memory usage. Notably, it achieves the highest generation quality compared to existing caching methods, as measured by Fréchet Inception Distance (FID) and temporal Fréchet Inception Distance (t-FID).

Token Merging Module

To further enhance the speedup capabilities of FastCache, we have also introduced a token merging module. This module merges redundant tokens based on k-nearest neighbor (k-NN) density, further optimizing the processing pipeline and improving overall efficiency.

Conclusion

FastCache represents a significant advancement in the field of generative modeling with Diffusion Transformers. By effectively managing computation through intelligent caching and token selection, it provides a robust solution to the inefficiencies that have traditionally plagued these models. For those interested in exploring the implementation, the code is available at https://github.com/NoakLiu/FastCache-xDiT.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

FastCache: Accelerate Diffusion Transformer with Caching

FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

Abstract

Key Features of FastCache

Performance and Evaluation

Token Merging Module

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related