A Clinical Point Cloud Paradigm for In-Hospital Mortality Prediction from Multi-Level Incomplete Multimodal EHRs
Summary: arXiv:2604.04614v2 Announce Type: replace-cross
Abstract
Deep learning-based modeling of multimodal Electronic Health Records (EHRs) has become an important approach for clinical diagnosis and risk prediction. However, due to diverse clinical workflows and privacy constraints, raw EHRs are inherently multi-level incomplete, including irregular sampling, missing modalities, and sparse labels. These issues cause temporal misalignment, modality imbalance, and limited supervision. Most existing multimodal methods assume relatively complete data, and even methods designed for incompleteness usually address only one or two of these issues in isolation. As a result, they often rely on rigid temporal/modal alignment or discard incomplete data, which may distort raw clinical semantics.
Introduction
In the evolving landscape of healthcare, the integration of artificial intelligence (AI) into clinical workflows is becoming increasingly pivotal. The ability to predict in-hospital mortality through the analysis of Electronic Health Records (EHRs) is a prime example of how AI can enhance patient care. However, the challenge lies in the inherent incompleteness of EHR data.
Challenges of EHR Data
The raw EHR data is often plagued by several challenges:
- Irregular Sampling: Patient data is collected at different intervals, making it difficult to establish a consistent timeline.
- Missing Modalities: Various clinical assessments and tests may not be available for all patients.
- Sparse Labels: Not all clinical events are labeled, creating gaps in the dataset.
Proposed Solution: HealthPoint (HP)
To tackle these challenges, we propose HealthPoint (HP), a unified clinical point cloud paradigm specifically designed for multi-level incomplete EHRs. HP conceptualizes heterogeneous clinical events as points in a continuous 4D space defined by:
- Content: The specific clinical event or observation.
- Time: The temporal aspect related to the clinical event.
- Modality: The type of data collected (e.g., lab results, imaging studies).
- Case: The unique patient case being analyzed.
Modeling Interactions
To model interactions between arbitrary point pairs effectively, we introduce a Low-Rank Relational Attention mechanism. This innovative approach efficiently captures high-order dependencies across the four dimensions of our framework. Furthermore, we have developed a hierarchical interaction and sampling strategy that balances the need for fine-grained modeling with computational efficiency.
Benefits of HealthPoint
Built on this robust framework, HP allows for:
- Flexible event-level interaction.
- Fine-grained self-supervision.
- Robust modality recovery.
- Effective utilization of unlabeled data.
Performance and Conclusion
Experiments conducted on large-scale EHR datasets for risk prediction demonstrate that HealthPoint consistently achieves state-of-the-art performance. Additionally, it exhibits strong robustness under varying degrees of incompleteness, indicating its potential to revolutionize clinical decision-making and risk assessment.
In conclusion, HealthPoint stands as a promising advancement in the application of AI to healthcare, addressing the critical challenges posed by incomplete EHR data and paving the way for improved patient outcomes.
