Region-Graph Optimal Transport Routing for Mixture-of-Experts Whole-Slide Image Classification
Summary: arXiv:2604.07298v1 Announce Type: cross
Abstract
Multiple Instance Learning (MIL) has emerged as the dominant framework for gigapixel whole-slide image (WSI) classification in the domain of computational pathology. Despite its widespread use, existing MIL aggregators tend to route all instances through a shared pathway. This configuration constrains their ability to specialize in addressing the pathological heterogeneity present in each slide.
The Challenge of Current Approaches
To improve upon these limitations, Mixture-of-Experts (MoE) methods naturally allow for the partitioning of instances across specialized expert subnetworks. However, the use of unconstrained softmax routing can lead to highly imbalanced utilization. In such cases, one or a few experts may absorb the majority of routing mass, effectively collapsing the mixture back to a nearly single-pathway solution.
Introducing ROAM
To tackle these challenges, we propose ROAM (Region-graph OptimAl-transport Mixture-of-experts), a spatially aware MoE-MIL aggregator. ROAM routes region tokens to expert poolers through capacity-constrained entropic optimal transport. This approach promotes balanced expert utilization by design.
Key Mechanisms of ROAM
ROAM operates on spatial region tokens, which are derived by compressing dense patch bags into spatially binned units. This method aligns routing with local tissue neighborhoods and introduces two critical mechanisms:
- Region-to-Expert Assignment: Formulated as entropic optimal transport (Sinkhorn) with explicit per slide capacity marginals, this mechanism enforces balanced expert utilization without the need for auxiliary load-balancing losses.
- Graph-Regularised Sinkhorn Iterations: This feature diffuses routing assignments over the spatial region graph. It encourages neighboring regions to coherently route to the same experts, enhancing the performance of the classification process.
Performance Evaluation
ROAM has been evaluated on four WSI benchmarks utilizing frozen foundation-model patch embeddings. The results demonstrate that ROAM achieves performance levels that are competitive with strong MIL and MoE baselines. Notably, in the context of NSCLC generalization (TCGA-CPTAC), ROAM attained an external AUC of 0.845 ± 0.019, indicating its robustness and effectiveness in real-world applications.
Conclusion
In summary, ROAM represents a significant advancement in the field of computational pathology, particularly in the classification of whole-slide images. By leveraging region-graph optimal transport routing within a Mixture-of-Experts framework, this innovative approach addresses the challenges of pathological heterogeneity and expert utilization imbalance effectively.
