When LLM Explanations Hurt Human-AI Team Performance

Date:


The Persuasion Paradox: When LLM Explanations Fail to Improve Human-AI Team Performance

Summary: arXiv:2604.03237v1 Announce Type: cross

Large language models (LLMs) have become increasingly integrated into various sectors, providing natural-language explanations to enhance transparency and foster trust among users. However, recent research reveals a concerning trend: while these explanations boost user confidence in AI outputs, they do not necessarily translate to improved performance when humans collaborate with AI. This phenomenon is termed the “Persuasion Paradox.”

The study, conducted across three controlled human-subject experiments, examined the impact of LLM explanations on human-AI team performance in tasks involving abstract visual reasoning and deductive logical reasoning. The findings indicate a complex relationship between AI predictions, user confidence, and task accuracy.

Key Findings

  • Visual Reasoning Tasks: In the context of RAVEN matrices, explanations provided by LLMs increased user confidence without enhancing accuracy. In fact, users exhibited a reduced capacity to recover from AI model errors when relying on these explanations.
  • Deductive Logical Reasoning: For LSAT problems, LLM explanations demonstrated a different outcome, yielding the highest accuracy and recovery rates compared to traditional expert-written explanations and probability-based aids.
  • Model Uncertainty Exposure: Interfaces that displayed model uncertainty through predicted probabilities, along with a selective automation policy deferring uncertain cases to human intervention, significantly outperformed explanation-based interfaces in terms of accuracy and error recovery.

The Task Dependency of Explanations

The divergence in performance outcomes across different tasks underscores the notion that the effectiveness of narrative explanations is not uniform. Instead, it is strongly mediated by cognitive modalities, suggesting that users respond differently based on the nature of the task at hand.

Implications for Human-AI Interaction Design

These findings raise critical questions about the validity of conventional metrics used to assess human-AI interactions. Common subjective evaluations such as trust, confidence, and perceived clarity do not serve as reliable indicators of team performance.

In light of this research, the authors advocate for a paradigm shift in interaction design. Instead of viewing explanations as a one-size-fits-all solution, the emphasis should be placed on:

  • Prioritizing calibrated reliance on AI systems.
  • Enhancing effective error recovery strategies.
  • Designing interfaces that acknowledge and address model uncertainty.

As AI continues to evolve, understanding the nuanced dynamics between human users and AI systems will be crucial for ensuring optimal collaboration and performance in diverse applications.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.