From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs
In the ever-evolving field of medical artificial intelligence, the issue of hallucinations in large language models (LLMs) presents a considerable safety challenge. These hallucinations—instances where the model generates incorrect or misleading information—can have dire consequences, especially in critical settings such as diabetic retinopathy (DR) decision-making. A new study published on arXiv (arXiv:2604.05348v1) introduces a promising solution to this problem.
Introducing RETINA-SAFE
The study presents RETINA-SAFE, an evidence-grounded benchmark that aligns closely with retinal grading records. This benchmark consists of an extensive dataset comprising 12,522 samples, specifically organized into three evidence-relation tasks:
- E-Align: Focuses on evidence-consistent scenarios.
- E-Conflict: Deals with evidence-conflicting situations.
- E-Gap: Addresses scenarios with evidence insufficiency.
Proposing ECRT for Risk Triage
In addition to RETINA-SAFE, the study proposes the Evidence-Conditioned Risk Triage (ECRT), a two-stage white-box detection framework designed to enhance the safety and reliability of medical LLMs. The ECRT framework operates in two distinct stages:
- Stage 1: Performs Safe/Unsafe risk triage.
- Stage 2: Refines unsafe cases into two categories: contradiction-driven risks and evidence-gap risks.
Mechanisms Behind ECRT
The ECRT framework leverages internal representation and logit shifts under Context (CTX) and No Context (NOCTX) conditions. This approach incorporates class-balanced training, which is crucial for robust learning and accurate risk assessment. The study highlights that under evidence-grouped splits across various model backbones, ECRT demonstrates strong performance in Stage-1 risk triage and explicit subtype attribution.
Key Findings
Notably, ECRT improves Stage-1 balanced accuracy by:
- +0.15 to +0.19 over external uncertainty and self-consistency baselines.
- +0.02 to +0.07 over the strongest adapted supervised baseline.
Moreover, it consistently outperforms a single-stage white-box ablation regarding Stage-1 balanced accuracy. These findings underline the effectiveness of utilizing internal signals grounded in retinal evidence as a viable approach to interpretable medical LLM risk triage.
Conclusion
The introduction of RETINA-SAFE and the ECRT framework marks a significant advancement in the effort to mitigate the risks associated with hallucinations in medical LLMs. By grounding decisions in evidence and improving triage accuracy, these innovations hold the potential to enhance patient safety and trust in AI-driven medical applications.
