PoliticsBench: Evaluating Political Bias in Large Language Models

PoliticsBench: Benchmarking Political Values in Large Language Models with Multi-Turn Roleplay

Summary: arXiv:2603.23841v1 Announce Type: cross

Abstract

While Large Language Models (LLMs) are increasingly used as primary sources of information, their potential for political bias may impact their objectivity. Existing benchmarks of LLM social bias primarily evaluate gender and racial stereotypes. When political bias is included, it is typically measured at a coarse level, neglecting the specific values that shape sociopolitical leanings. This study investigates political bias in eight prominent LLMs (Claude, Deepseek, Gemini, GPT, Grok, Llama, Qwen Base, Qwen Instruction-Tuned) using PoliticsBench: a novel multi-turn roleplay framework adapted from the EQ-Bench-v3 psychometric benchmark.

Key Findings

The research aims to determine whether commercially developed LLMs display a systematic left-leaning bias that becomes more pronounced in later stages of multi-stage roleplay. Through twenty evolving scenarios, each model reported its stance and determined its course of action. The study scored these responses on a scale of ten political values, exploring the values underlying chatbots’ deviations from unbiased standards.

Model Analysis

The analysis revealed significant insights into the political leanings of the evaluated LLMs:

Seven of the eight models displayed a left-leaning bias, while Grok exhibited a right-leaning stance.
Each left-leaning model strongly exhibited liberal traits and moderately demonstrated conservative ones.
There were slight variations in alignment scores across different stages of roleplay, with no discernible pattern emerging.

Reasoning Patterns

The study also investigated the reasoning patterns employed by the models during the roleplay scenarios:

Most models utilized consequence-based reasoning to arrive at conclusions.
Grok distinguished itself by frequently relying on factual arguments and statistics in its responses.

Conclusion

This research presents the first psychometric evaluation of political values in LLMs through multi-stage, free-text interactions. By employing the PoliticsBench framework, the study not only sheds light on the existing political biases inherent in these models but also provides a foundation for future research aimed at understanding and mitigating such biases. As LLMs become more integrated into society, ensuring their objectivity and fairness in political discourse remains an essential priority.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

PoliticsBench: Evaluating Political Bias in Large Language Models

PoliticsBench: Benchmarking Political Values in Large Language Models with Multi-Turn Roleplay

Abstract

Key Findings

Model Analysis

Reasoning Patterns

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related