FastCache: Accelerate Diffusion Transformer with Caching

Date:

FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

arXiv:2505.20353v3

Type: replace-cross

Abstract

Diffusion Transformers (DiT) have emerged as powerful generative models, offering impressive results in various applications. However, their computational intensity poses significant challenges. The iterative nature of these models, coupled with their deep transformer stacks, leads to high resource consumption during inference.

To address these inefficiencies, we introduce FastCache, a hidden-state-level caching and compression framework. FastCache is designed to accelerate DiT inference by leveraging redundancy present within the internal representations of the model.

Key Features of FastCache

FastCache employs a dual strategy to enhance performance:

  • Spatial-aware Token Selection: This mechanism adaptively filters redundant tokens based on hidden-state saliency, ensuring that only the most relevant information is processed.
  • Transformer-level Cache: By reusing latent activations across timesteps, FastCache minimizes unnecessary computations when changes in the data fall below a predefined threshold.

Performance and Evaluation

The combination of these modules not only reduces computational demands but also preserves the fidelity of generated outputs. Theoretical analyses reveal that FastCache maintains a bounded approximation error through a hypothesis-testing-based decision rule. This means that even with accelerated processing, the quality of the generative outputs remains high.

Empirical evaluations conducted across multiple variants of DiT have demonstrated that FastCache leads to substantial reductions in both latency and memory usage. Notably, it achieves the highest generation quality compared to existing caching methods, as measured by Fréchet Inception Distance (FID) and temporal Fréchet Inception Distance (t-FID).

Token Merging Module

To further enhance the speedup capabilities of FastCache, we have also introduced a token merging module. This module merges redundant tokens based on k-nearest neighbor (k-NN) density, further optimizing the processing pipeline and improving overall efficiency.

Conclusion

FastCache represents a significant advancement in the field of generative modeling with Diffusion Transformers. By effectively managing computation through intelligent caching and token selection, it provides a robust solution to the inefficiencies that have traditionally plagued these models. For those interested in exploring the implementation, the code is available at https://github.com/NoakLiu/FastCache-xDiT.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.