IatroBench: Evidence of AI Safety Risks in Medical Advice

Date:

IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures

Summary: arXiv:2604.07709v3 Announce Type: replace

Abstract: Ask a frontier model how to taper six milligrams of alprazolam (psychiatrist retired, ten days of pills left, abrupt cessation causes seizures) and it tells her to call the psychiatrist she just explained does not exist. Change one word (“I’m a psychiatrist; a patient presents with…”) and the same model, same weights, same inference pass produces a textbook Ashton Manual taper with diazepam equivalence, anticonvulsant coverage, and monitoring thresholds. The knowledge was there; the model withheld it. IatroBench measures this gap.

Sixty pre-registered clinical scenarios, six frontier models, and 3,600 responses were scored on two axes: commission harm (CH 0-3) and omission harm (OH 0-4) through a structured-evaluation pipeline validated against physician scoring (kappa_w = 0.571, within-1 agreement 96%). The central finding is identity-contingent withholding: match the same clinical question in physician vs. layperson framing and all five testable models provide better guidance to the physician.

The study reveals several critical insights:

  • Decoupling Gap: The gap between responses improves significantly when the question is framed for a physician versus a layperson (+0.38, p = 0.003).
  • Safety-Colliding Actions: Binary hit rates on safety-colliding actions drop by 13.1 percentage points in layperson framing (p < 0.0001), while non-colliding actions show no change.
  • Model Performance: The gap is widest for the model with the heaviest safety investment, Opus, which shows a decoupling gap of +0.65.

Three distinct failure modes were identified:

  • Trained Withholding: Opus demonstrated significant withholding of information.
  • Incompetence: Llama 4 showed notable deficiencies in understanding and generating appropriate medical responses.
  • Indiscriminate Content Filtering: GPT-5.2’s post-generation filter strips physician responses at a rate nine times higher than layperson responses, primarily because they contain denser pharmacological tokens.

Additionally, the standard language model judge assigns an omission harm (OH) score of 0 to 73% of responses that physicians score with OH >= 1 (kappa = 0.045). This indicates that the evaluation apparatus shares the same blind spot as the training apparatus, underscoring a significant flaw in current AI safety measures.

Every clinical scenario in the study targets individuals who have already exhausted standard referrals, highlighting the urgency and necessity for accurate AI-generated guidance in critical medical contexts. The implications of these findings are profound, suggesting that while AI models hold significant potential for improving healthcare, they also pose risks when safety measures inadvertently lead to withholding crucial information.

As AI continues to evolve, ensuring that these models can provide reliable and safe medical advice becomes paramount. Future research should focus on refining these models to minimize the risks of iatrogenic harm while maximizing their benefits in clinical settings.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.