Detecting Intersectional Bias in LLMs Using Persona Engineering

Invisible Influences: Investigating Implicit Intersectional Biases through Persona Engineering in Large Language Models

Summary: arXiv:2604.06213v1 Announce Type: cross

Abstract

Large Language Models (LLMs) excel at human-like language generation but often embed and amplify implicit, intersectional biases, especially under persona-driven contexts. Existing bias audits rely on static, embedding-based tests (CEAT, I-WEAT, I-SEAT) that quantify absolute association strengths. We show that they have limitations in capturing dynamic shifts when models adopt social roles.

We address this gap by introducing the Bias Amplification Differential and Explainability Score (BADx): a novel, scalable metric that measures persona-induced bias amplification and integrates local explainability insights. BADx comprises three components – differential bias scores (BAD, based on CEAT, I-WEAT, I-SEAT), Persona Sensitivity Index (PSI), and Volatility (Standard Deviation), augmented by LIME-based analysis for emphasizing explainability.

This study is divided and performed as two different tasks. Task 1 establishes static bias baselines, and Task 2 applies six persona frames (marginalized and structurally advantaged) to measure BADx, PSI, and volatility. This is studied across five state-of-the-art LLMs (GPT-4o, DeepSeek-R1, LLaMA-4, Claude 4.0 Sonnet and Gemma-3n E4B).

Key Findings

Results show persona context significantly modulates bias. The following findings were observed across the models:

GPT-4o: Exhibits high sensitivity and volatility.
DeepSeek-R1: Suppresses bias but demonstrates erratic volatility.
LLaMA-4: Maintains low volatility and a stable bias profile with limited amplification.
Claude 4.0 Sonnet: Achieves balanced modulation of bias.
Gemma-3n E4B: Attains the lowest volatility with moderate amplification.

Advantages of BADx

BADx performs better than static methods by revealing context-sensitive biases that are often overlooked in static methods. Our unified method offers a systematic way to detect dynamic implicit intersectional bias in five popular LLMs.

Conclusions

The introduction of BADx marks a significant advancement in the audit of biases in LLMs, offering deeper insights into how different personas can influence biases within these models. As AI continues to evolve, understanding and mitigating these biases remains crucial for ethical AI deployment and user trust.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Detecting Intersectional Bias in LLMs Using Persona Engineering

Invisible Influences: Investigating Implicit Intersectional Biases through Persona Engineering in Large Language Models

Abstract

Key Findings

Advantages of BADx

Conclusions

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related