Invisible Influences: Investigating Implicit Intersectional Biases through Persona Engineering in Large Language Models
Summary: arXiv:2604.06213v1 Announce Type: cross
Abstract
Large Language Models (LLMs) excel at human-like language generation but often embed and amplify implicit, intersectional biases, especially under persona-driven contexts. Existing bias audits rely on static, embedding-based tests (CEAT, I-WEAT, I-SEAT) that quantify absolute association strengths. We show that they have limitations in capturing dynamic shifts when models adopt social roles.
We address this gap by introducing the Bias Amplification Differential and Explainability Score (BADx): a novel, scalable metric that measures persona-induced bias amplification and integrates local explainability insights. BADx comprises three components – differential bias scores (BAD, based on CEAT, I-WEAT, I-SEAT), Persona Sensitivity Index (PSI), and Volatility (Standard Deviation), augmented by LIME-based analysis for emphasizing explainability.
This study is divided and performed as two different tasks. Task 1 establishes static bias baselines, and Task 2 applies six persona frames (marginalized and structurally advantaged) to measure BADx, PSI, and volatility. This is studied across five state-of-the-art LLMs (GPT-4o, DeepSeek-R1, LLaMA-4, Claude 4.0 Sonnet and Gemma-3n E4B).
Key Findings
Results show persona context significantly modulates bias. The following findings were observed across the models:
- GPT-4o: Exhibits high sensitivity and volatility.
- DeepSeek-R1: Suppresses bias but demonstrates erratic volatility.
- LLaMA-4: Maintains low volatility and a stable bias profile with limited amplification.
- Claude 4.0 Sonnet: Achieves balanced modulation of bias.
- Gemma-3n E4B: Attains the lowest volatility with moderate amplification.
Advantages of BADx
BADx performs better than static methods by revealing context-sensitive biases that are often overlooked in static methods. Our unified method offers a systematic way to detect dynamic implicit intersectional bias in five popular LLMs.
Conclusions
The introduction of BADx marks a significant advancement in the audit of biases in LLMs, offering deeper insights into how different personas can influence biases within these models. As AI continues to evolve, understanding and mitigating these biases remains crucial for ethical AI deployment and user trust.
