SSMamba: A Self-Supervised Hybrid State Space Model for Pathological Image Classification
Summary: arXiv:2604.15711v1 Announce Type: cross
Introduction
Pathological diagnosis has become increasingly reliant on advanced image analysis techniques. In this domain, Regions of Interest (ROIs) provide the essential groundwork for diagnostic evidence, while whole-slide image (WSI)-level tasks focus on capturing broader, aggregated patterns. The need for effective extraction of critical morphological features has led to the adoption of ROI-level Foundation Models (FMs) based on Vision Transformers (ViTs) and large-scale self-supervised learning (SSL). Nevertheless, the application of these models is hampered by three significant limitations.
Core Limitations
- Cross-Magnification Domain Shift: Fixed-scale pretraining creates challenges in adapting to the diverse clinical settings encountered in practice.
- Inadequate Local-Global Relationship Modeling: The ViT backbone in FMs suffers from high computational overhead and struggles with precise local characterization.
- Insufficient Fine-Grained Sensitivity: Traditional self-attention mechanisms often overlook subtle diagnostic cues that are crucial in pathology.
Introduction of SSMamba
To address these challenges, we introduce SSMamba, a self-supervised hybrid framework designed to enable effective fine-grained feature learning without the need for extensive external datasets. The framework integrates three innovative domain-adaptive components:
- Mamba Masked Image Modeling (MAMIM): This component is designed to mitigate domain shift issues effectively.
- Directional Multi-scale (DMS) Module: The DMS module facilitates balanced local-global relationship modeling, improving the analysis of intricate patterns.
- Local Perception Residual (LPR) Module: By enhancing fine-grained sensitivity, the LPR module captures subtle diagnostic cues that are often overlooked.
Methodology
SSMamba employs a two-stage pipeline that consists of SSL pretraining on targeted ROI datasets, followed by supervised fine-tuning (SFT). This approach allows for the model to be tailored specifically for the unique challenges posed by pathological image classification.
Performance Evaluation
In rigorous assessments, SSMamba has demonstrated its effectiveness by outperforming 11 state-of-the-art (SOTA) pathological FMs across 10 public ROI datasets. Furthermore, it surpasses 8 SOTA methods on 6 public WSI datasets. These results underscore the importance of task-specific architectural designs in the realm of pathological image analysis.
Conclusion
With the introduction of SSMamba, we are witnessing a significant leap forward in the realm of pathological image classification. By addressing the limitations inherent in previous models and implementing innovative components for enhanced feature extraction, SSMamba sets a new benchmark for future research and applications in this critical field.
