SA-CycleGAN-2.5D: Self-Attention CycleGAN with Tri-Planar Context for Multi-Site MRI Harmonization
Summary: arXiv:2603.17219v2 Announce Type: replace-cross
Abstract
Multi-site neuroimaging analysis is fundamentally confounded by scanner-induced covariate shifts, where the marginal distribution of voxel intensities P(𝑥) varies non-linearly across acquisition protocols while the conditional anatomy P(𝑦|𝑥) remains constant. This is particularly detrimental to radiomic reproducibility, where acquisition variance often exceeds biological pathology variance. Existing statistical harmonization methods (e.g., ComBat) operate in feature space, precluding spatial downstream tasks, while standard deep learning approaches are theoretically bounded by local effective receptive fields (ERF), failing to model the global intensity correlations characteristic of field-strength bias.
Introduction
In the realm of neuroimaging, the need for harmonizing data across multiple sites is critical for improving the reliability of analyses. The inconsistencies introduced by different MRI scanners can significantly affect the interpretation of radiomic features. The introduction of SA-CycleGAN-2.5D represents a significant advancement in addressing these challenges through innovative deep learning techniques.
Key Innovations
The SA-CycleGAN-2.5D framework integrates several architectural innovations aimed at enhancing the harmonization process:
- 2.5D Tri-Planar Manifold Injection: This approach preserves through-plane gradients ∇z at O(HW) complexity, allowing for efficient data processing.
- U-ResNet Generator: Utilizing dense voxel-to-voxel self-attention, this generator surpasses the O(√L) receptive field limit of conventional convolutional neural networks (CNNs), enabling it to effectively model global scanner field biases.
- Spectrally-Normalized Discriminator: This component constrains the Lipschitz constant (KD ≤ 1), ensuring stable adversarial optimization throughout the training process.
Evaluation and Results
The performance of SA-CycleGAN-2.5D was evaluated on a dataset comprising 654 glioma patients across two institutional domains: BraTS and UPenn-GBM. The results demonstrated a remarkable reduction in Maximum Mean Discrepancy (MMD) by 99.1%, from 1.729 to 0.015. Furthermore, the accuracy of the domain classifier was degraded to near-chance levels at 59.7%, indicating effective harmonization.
Ablation studies confirmed that the implementation of global attention is statistically essential for the more challenging heterogeneous-to-homogeneous translation direction, achieving a Cohen’s d = 1.32 with p < 0.001.
Conclusion
By bridging the gap between 2D efficiency and 3D consistency, the SA-CycleGAN-2.5D framework provides voxel-level harmonized images that maintain tumor pathophysiology. This advancement has significant implications for reproducible multi-center radiomic analysis, promising improved outcomes in neuroimaging studies and beyond.
