A Generative Foundation Model for Multimodal Histopathology
Summary: arXiv:2604.03635v1 Announce Type: cross
Abstract: Accurate diagnosis and treatment of complex diseases require integrating histological, molecular, and clinical data, yet in practice these modalities are often incomplete owing to tissue scarcity, assay cost, and workflow constraints. Existing computational approaches attempt to impute missing modalities from available data but rely on task-specific models trained on narrow, single source-target pairs, limiting their generalizability.
Here we introduce MuPD (Multimodal Pathology Diffusion), a generative foundation model that embeds hematoxylin and eosin (H&E)-stained histology, molecular RNA profiles, and clinical text into a shared latent space through a diffusion transformer with decoupled cross-modal attention. Pretrained on 100 million histology image patches, 1.6 million text-histology pairs, and 10.8 million RNA-histology pairs spanning 34 human organs, MuPD supports diverse cross-modal synthesis tasks with minimal or no task-specific fine-tuning.
Key Features of MuPD
- Cross-Modal Synthesis: MuPD excels in tasks that require synthesizing data across different modalities, including text-conditioned and image-to-image generation.
- Histologically Faithful Tissue Architectures: The model synthesizes realistic tissue structures by reducing Fréchet inception distance (FID) scores by 50% relative to domain-specific models.
- Improved Classification Accuracy: By employing synthetic data augmentation, MuPD enhances few-shot classification accuracy by up to 47%.
- RNA-Conditioned Generation: For RNA-conditioned histology generation, MuPD achieves a 23% reduction in FID compared to alternative methods while maintaining accurate cell-type distributions across five cancer types.
- Virtual Staining Capabilities: MuPD acts as a virtual stainer, effectively translating H&E images into immunohistochemistry and multiplex immunofluorescence. This capability results in a 37% improvement in average marker correlation over existing approaches.
Implications for Multimodal Histopathology
The introduction of MuPD marks a significant advancement in the field of multimodal histopathology. By utilizing a single, unified generative model that is pretrained across a variety of pathology modalities, the research team demonstrates that it can significantly outperform specialized alternatives. This provides a scalable computational framework that can address the challenges posed by incomplete data in histopathological analysis.
Furthermore, the ability of MuPD to integrate diverse data types into a shared latent space allows for better diagnostic and therapeutic insights, which is crucial for the accurate treatment of complex diseases. The implications of this model extend beyond research, as it can potentially enhance clinical workflows by providing reliable synthetic data for training machine learning models.
Conclusion
In conclusion, MuPD represents a transformative step forward in the integration of multimodal data in histopathology. Its capacity to generate high-quality synthetic data and improve diagnostic accuracy highlights the potential of generative models in the medical field. As the demand for precise and comprehensive disease diagnosis continues to grow, innovations like MuPD will play an essential role in shaping the future of pathology.
