Detecting Intersectional Bias in LLMs Using Persona Engineering

Date:

Invisible Influences: Investigating Implicit Intersectional Biases through Persona Engineering in Large Language Models

Summary: arXiv:2604.06213v1 Announce Type: cross

Abstract

Large Language Models (LLMs) excel at human-like language generation but often embed and amplify implicit, intersectional biases, especially under persona-driven contexts. Existing bias audits rely on static, embedding-based tests (CEAT, I-WEAT, I-SEAT) that quantify absolute association strengths. We show that they have limitations in capturing dynamic shifts when models adopt social roles.

We address this gap by introducing the Bias Amplification Differential and Explainability Score (BADx): a novel, scalable metric that measures persona-induced bias amplification and integrates local explainability insights. BADx comprises three components – differential bias scores (BAD, based on CEAT, I-WEAT, I-SEAT), Persona Sensitivity Index (PSI), and Volatility (Standard Deviation), augmented by LIME-based analysis for emphasizing explainability.

This study is divided and performed as two different tasks. Task 1 establishes static bias baselines, and Task 2 applies six persona frames (marginalized and structurally advantaged) to measure BADx, PSI, and volatility. This is studied across five state-of-the-art LLMs (GPT-4o, DeepSeek-R1, LLaMA-4, Claude 4.0 Sonnet and Gemma-3n E4B).

Key Findings

Results show persona context significantly modulates bias. The following findings were observed across the models:

  • GPT-4o: Exhibits high sensitivity and volatility.
  • DeepSeek-R1: Suppresses bias but demonstrates erratic volatility.
  • LLaMA-4: Maintains low volatility and a stable bias profile with limited amplification.
  • Claude 4.0 Sonnet: Achieves balanced modulation of bias.
  • Gemma-3n E4B: Attains the lowest volatility with moderate amplification.

Advantages of BADx

BADx performs better than static methods by revealing context-sensitive biases that are often overlooked in static methods. Our unified method offers a systematic way to detect dynamic implicit intersectional bias in five popular LLMs.

Conclusions

The introduction of BADx marks a significant advancement in the audit of biases in LLMs, offering deeper insights into how different personas can influence biases within these models. As AI continues to evolve, understanding and mitigating these biases remains crucial for ethical AI deployment and user trust.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.