DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Summary: arXiv:2511.19365v2 Announce Type: replace-cross
Abstract: Pixel diffusion aims to generate images directly in pixel space in an end-to-end fashion. This approach avoids the limitations of Variational Autoencoders (VAE) in the two-stage latent diffusion, offering higher model capacity. Existing pixel diffusion models suffer from slow training and inference times, as they typically model both high-frequency signals and low-frequency semantics within a single diffusion transformer (DiT). To pursue a more efficient pixel diffusion paradigm, we propose the frequency-DeCoupled pixel diffusion framework.
Introduction
The realm of image generation has witnessed significant advancements, especially with the introduction of pixel diffusion techniques. However, the existing methods are often bogged down by inefficiencies arising from the simultaneous modeling of varying frequency components. The DeCo framework introduces a novel approach that segregates the generation of high and low frequency components, enhancing both training and inference speeds.
Key Features of DeCo
- Decoupled Generation: By leveraging a lightweight pixel decoder, DeCo generates high-frequency details while relying on semantic guidance from the DiT, allowing the latter to focus on low-frequency semantics.
- Frequency-aware Flow-matching Loss: This innovative loss function emphasizes visually salient frequencies, effectively suppressing insignificant ones, which results in higher quality image generation.
- Performance Metrics: Extensive experiments indicate that DeCo achieves an impressive Fréchet Inception Distance (FID) score of 1.62 for 256×256 images and 2.22 for 512×512 images on the ImageNet dataset, significantly narrowing the performance gap with traditional latent diffusion methods.
- Leading Text-to-Image Model: In a system-level comparison, DeCo’s pretrained text-to-image model scored a remarkable 0.86 on GenEval, establishing its dominance in the field.
Conclusion
The introduction of the DeCo framework presents a promising advancement in the pixel diffusion landscape. By effectively decoupling high and low frequency component generation, it not only improves efficiency but also enhances the quality of generated images. The public availability of the code at https://github.com/Zehong-Ma/DeCo encourages further exploration and development in this exciting area of AI-driven image generation.
Future Directions
As research progresses, it will be intriguing to see how the principles of frequency decoupling can be applied to other domains within generative modeling. The potential for improvements in speed and accuracy may lead to groundbreaking applications in various fields, including virtual reality, gaming, and artistic content creation.
