Detecting and Refurbishing Ground Truth Errors During Training of Deep Learning-Based Echocardiography Segmentation Models
Summary: arXiv:2604.12832v1 Announce Type: cross
Abstract
Deep learning-based medical image segmentation typically relies on ground truth (GT) labels obtained through manual annotation, but these can be prone to random errors or systematic biases. This study examines the robustness of deep learning models to such errors in echocardiography (echo) segmentation and evaluates a novel strategy for detecting and refurbishing erroneous labels during model training.
Introduction
The use of deep learning models in medical image segmentation has gained significant traction in recent years, particularly in the field of echocardiography. Accurate segmentation is crucial for diagnosing cardiac conditions and guiding treatment decisions. However, the reliance on manually annotated ground truth labels introduces the risk of errors that can adversely impact the model’s performance.
Methodology
This study utilizes the CAMUS dataset to investigate the effect of ground truth label errors on model performance. We simulate three types of errors:
- Random errors
- Systematic biases
- Label omissions
Subsequently, we compare two error detection methods: a loss-based ground truth label error detection method and a novel approach based on Variance of Gradients (VOG). Additionally, we propose a pseudo-labelling strategy aimed at refurbishing suspected erroneous GT labels.
Results
Our findings reveal that the VOG method was particularly effective in identifying erroneous ground truth labels during the training process. In contrast, the standard U-Net architecture demonstrated robust performance, maintaining accuracy even in the presence of random label errors and moderate levels of systematic errors (up to 50%).
Discussion
The capability to detect and refurbish erroneous GT labels is essential in enhancing the reliability of deep learning models in medical imaging. Our proposed approach not only improved performance metrics under high-error conditions but also provided insights into the nature of the errors affecting model training.
Conclusion
This study underscores the importance of addressing ground truth errors in deep learning-based segmentation models. By employing methods such as VOG for error detection and pseudo-labelling for refurbishment, we can significantly enhance the robustness and applicability of these models in clinical settings. Future work will focus on refining these methods and exploring their applicability across different medical imaging modalities.
Keywords
- Deep learning
- Echocardiography
- Image segmentation
- Ground truth errors
- Variance of Gradients (VOG)
