Delving Aleatoric Uncertainty in Medical Image Segmentation via Vision Foundation Models
Summary: arXiv:2604.10963v1 Announce Type: new
Abstract
Medical image segmentation supports clinical workflows by precisely delineating anatomical structures and lesions. However, medical image datasets suffer from acquisition noise and annotation ambiguity, causing pervasive data uncertainty that substantially undermines model robustness. Existing research focuses primarily on model architectural improvements and predictive reliability estimation, while systematic exploration of the intrinsic data uncertainty remains insufficient.
To address this gap, this work proposes leveraging the universal representation capabilities of visual foundation models to estimate inherent data uncertainty. Specifically, we analyze the feature diversity of the model’s decoded representations and quantify their singular value energy to define the semantic perception scale for each class, thereby measuring sample difficulty and aleatoric uncertainty.
Proposed Strategies
Based on this foundation, we design two uncertainty-driven application strategies:
- Aleatoric Uncertainty-Aware Data Filtering Mechanism: This mechanism aims to eliminate potentially noisy samples, thereby enhancing model learning quality.
- Dynamic Uncertainty-Aware Optimization Strategy: This strategy adaptively adjusts class-specific loss weights during training based on the semantic perception scale. It is combined with a label denoising mechanism to improve training stability.
Experimental Results
We conducted extensive experiments on five public datasets encompassing CT and MRI modalities. The datasets involved multi-organ and tumor segmentation tasks, allowing us to evaluate our proposed methods comprehensively. The results demonstrated that our method achieves significant and robust performance improvements across various mainstream network architectures. These findings reveal the broad application potential of aleatoric uncertainty in medical image understanding and segmentation tasks.
Conclusion
The integration of aleatoric uncertainty into medical image segmentation represents a crucial advancement in enhancing the robustness and performance of models in clinical applications. By leveraging visual foundation models, we can better understand and mitigate the intrinsic data uncertainties that plague medical imaging datasets. This work opens doors for future research and application in the field, emphasizing the importance of addressing data uncertainty for improved clinical outcomes.
