MedBayes-Lite: Bayesian Uncertainty Quantification for Safe Clinical Decision Support
Summary: arXiv:2511.16625v2 Announce Type: replace
Abstract: We propose MedBayes-Lite, a lightweight Bayesian enhancement for transformer-based clinical language models that improves reliability through uncertainty-aware prediction. The framework operates without retraining, architectural modification, or additional trainable parameters, and integrates three components: Bayesian Embedding Calibration via Monte Carlo dropout, Uncertainty-Weighted Attention for reliability-aware token aggregation, and Confidence-Guided Decision Shaping for abstention under uncertainty. Across MedQA, PubMedQA, and MIMIC-III, MedBayes-Lite improves calibration and trustworthiness, reducing overconfidence by 32–48%. In simulated clinical settings, it further supports safer decision-making by flagging uncertain predictions for human review, particularly under distribution shift. For closed API models, the framework remains applicable through sampling-based predictive uncertainty and confidence-guided abstention, while full embedding- and attention-level uncertainty propagation is evaluated on open-weight transformer models.
Introduction
The rapid advancement of artificial intelligence in healthcare has led to significant improvements in clinical decision-making. However, the reliability of these AI systems is paramount, particularly when they are used to support critical healthcare decisions. MedBayes-Lite aims to address the growing concerns regarding the trustworthiness of AI predictions in clinical settings by incorporating a Bayesian framework that quantifies uncertainty.
Key Components of MedBayes-Lite
MedBayes-Lite is designed to enhance existing transformer-based clinical language models without the need for extensive retraining or structural changes. The framework is built upon three core components:
- Bayesian Embedding Calibration: This feature utilizes Monte Carlo dropout to improve the calibration of model predictions, ensuring that the confidence levels assigned to predictions correspond more closely to their actual accuracy.
- Uncertainty-Weighted Attention: This component aggregates tokens in a way that accounts for their reliability, allowing the model to focus more on trustworthy information and reducing the influence of uncertain predictions.
- Confidence-Guided Decision Shaping: This mechanism enables the model to abstain from making predictions when uncertainty is too high, ensuring that decisions are only made when there is sufficient confidence, thereby enhancing safety in clinical applications.
Performance and Applications
In comprehensive evaluations across several prominent medical datasets, including MedQA, PubMedQA, and MIMIC-III, MedBayes-Lite has demonstrated significant improvements in calibration and trustworthiness. Notably, the framework has achieved a reduction in overconfidence by 32% to 48%, which is critical in high-stakes medical environments where incorrect predictions can lead to severe consequences.
Implications for Clinical Decision-Making
By flagging uncertain predictions for human review, MedBayes-Lite facilitates a safer decision-making process, particularly in scenarios where the distribution of data may shift, challenging the model’s reliability. The adaptability of the framework to closed API models through sampling-based predictive uncertainty and confidence-guided abstention highlights its versatility and potential for widespread implementation.
Conclusion
MedBayes-Lite represents a significant step forward in the integration of Bayesian methods into clinical AI systems. By prioritizing uncertainty quantification and enhancing the reliability of predictions, this framework not only improves trust in AI technologies but also supports clinicians in making better-informed decisions, ultimately leading to improved patient outcomes.
