MAST: Mask-Guided Attention Mass Allocation for Training-Free Multi-Style Transfer
Summary: arXiv:2604.12281v1 Announce Type: cross
Abstract: Style transfer aims to render a content image with the visual characteristics of a reference style while preserving its underlying semantic layout and structural geometry. While recent diffusion-based models demonstrate strong stylization capabilities by leveraging powerful generative priors and controllable internal representations, they typically assume a single global style. Extending them to multi-style scenarios often leads to boundary artifacts, unstable stylization, and structural inconsistency due to interference between multiple style representations. To overcome these limitations, we propose MAST (Mask-Guided Attention Mass Allocation for Training-Free Multi-Style Transfer), a novel training-free framework that explicitly controls content-style interactions within the diffusion attention mechanism.
Key Features of MAST
MAST integrates four connected modules to achieve artifact-free and structure-preserving stylization. These modules work collaboratively to enhance the performance of style transfer in multi-style scenarios:
- Layout-preserving Query Anchoring: This module prevents global layout collapse by firmly anchoring the semantic structure using content queries.
- Logit-level Attention Mass Allocation: This feature deterministically distributes attention probability mass across spatial regions, seamlessly fusing multiple styles without boundary artifacts.
- Sharpness-aware Temperature Scaling: This component restores the attention sharpness that may be degraded by multi-style expansion, ensuring that the stylization remains visually appealing.
- Discrepancy-aware Detail Injection: This module adaptively compensates for localized high-frequency detail losses by measuring structural discrepancies, maintaining the integrity of fine details in the image.
Advantages of MAST
Extensive experiments have demonstrated that MAST effectively mitigates boundary artifacts and maintains structural consistency. The following advantages have been observed:
- Artifact Mitigation: MAST successfully reduces noticeable boundary artifacts that often accompany multi-style applications.
- Structural Consistency: The framework preserves the underlying structure of the content image, ensuring that the semantic layout remains intact.
- Texture Fidelity: High fidelity of textures is maintained, allowing for a more authentic representation of the styles being applied.
- Spatial Coherence: As the number of applied styles increases, MAST ensures that spatial coherence is preserved, resulting in a harmonious blend of styles.
Conclusion
MAST represents a significant advancement in the field of style transfer, particularly in multi-style scenarios. By addressing the common challenges associated with boundary artifacts and structural inconsistency, MAST offers a reliable and effective solution for artists and developers alike. The innovative approach of combining multiple modules allows for enhanced control over the stylization process, making it a promising tool for future applications in digital art and image processing.
