MambaBack: Bridging Local Features and Global Contexts in Whole Slide Image Analysis
In the realm of computational pathology, Whole Slide Image (WSI) analysis has emerged as a crucial technique for cancer diagnosis. This methodology integrates various morphological and architectural cues across different magnifications, enabling pathologists to make more informed decisions. A significant framework in this domain is Multiple Instance Learning (MIL), which has been widely adopted for WSI analysis due to its effectiveness. Recently, a new architecture known as Mamba has risen to prominence as a backbone for MIL, surpassing the capabilities of traditional Transformers. This advancement is primarily attributed to Mamba’s efficiency and its ability to model global contexts, inspired by techniques from Natural Language Processing (NLP).
Despite its advantages, Mamba-based MIL approaches encounter several critical challenges that hinder their performance:
- Disruption of 2D Spatial Locality: The process of flattening 2D image sequences into 1D disrupts the essential spatial locality that is vital for accurate analysis.
- Sub-optimal Local Cellular Structure Modeling: Existing methods struggle to effectively capture the fine-grained local cellular structures, which are imperative for precise diagnosis.
- High Memory Peaks During Inference: Resource-constrained edge devices often face challenges due to high memory usage peaks during the inference phase.
Recent studies, such as MambaOut, have indicated that the Spatial Similarity Module (SSM) within Mamba is redundant for local feature extraction, where Gated Convolutional Neural Networks (CNNs) have proven sufficient. Acknowledging the need for WSI analysis to balance the extraction of fine-grained local features, similar to traditional natural images, and global context modeling akin to NLP, we introduce MambaBack. This novel hybrid architecture seeks to integrate the strengths of both Mamba and MambaOut.
The MambaBack architecture incorporates several innovative strategies:
- Hilbert Sampling Strategy: This technique is employed to maintain the 2D spatial locality of tiles while converting them into 1D sequences, significantly enhancing the model’s spatial perception.
- Hierarchical Structure: MambaBack features a hierarchical design that includes a 1D Gated CNN block derived from MambaOut, which is responsible for capturing local cellular features, alongside a BiMamba2 block that aggregates global context, thereby promoting a multi-scale representation.
- Asymmetric Chunking Design: This design enables parallel processing during the training phase and chunking-streaming accumulation during inference, effectively minimizing peak memory usage and making the architecture more suitable for deployment on resource-constrained devices.
Experimental evaluations conducted on five different datasets demonstrate that MambaBack significantly outperforms seven state-of-the-art methods in WSI analysis. The source code and datasets are publicly available, fostering further research and development in this crucial area of computational pathology.
