PoliticsBench: Benchmarking Political Values in Large Language Models with Multi-Turn Roleplay
Summary: arXiv:2603.23841v1 Announce Type: cross
Abstract
While Large Language Models (LLMs) are increasingly used as primary sources of information, their potential for political bias may impact their objectivity. Existing benchmarks of LLM social bias primarily evaluate gender and racial stereotypes. When political bias is included, it is typically measured at a coarse level, neglecting the specific values that shape sociopolitical leanings. This study investigates political bias in eight prominent LLMs (Claude, Deepseek, Gemini, GPT, Grok, Llama, Qwen Base, Qwen Instruction-Tuned) using PoliticsBench: a novel multi-turn roleplay framework adapted from the EQ-Bench-v3 psychometric benchmark.
Key Findings
The research aims to determine whether commercially developed LLMs display a systematic left-leaning bias that becomes more pronounced in later stages of multi-stage roleplay. Through twenty evolving scenarios, each model reported its stance and determined its course of action. The study scored these responses on a scale of ten political values, exploring the values underlying chatbots’ deviations from unbiased standards.
Model Analysis
The analysis revealed significant insights into the political leanings of the evaluated LLMs:
- Seven of the eight models displayed a left-leaning bias, while Grok exhibited a right-leaning stance.
- Each left-leaning model strongly exhibited liberal traits and moderately demonstrated conservative ones.
- There were slight variations in alignment scores across different stages of roleplay, with no discernible pattern emerging.
Reasoning Patterns
The study also investigated the reasoning patterns employed by the models during the roleplay scenarios:
- Most models utilized consequence-based reasoning to arrive at conclusions.
- Grok distinguished itself by frequently relying on factual arguments and statistics in its responses.
Conclusion
This research presents the first psychometric evaluation of political values in LLMs through multi-stage, free-text interactions. By employing the PoliticsBench framework, the study not only sheds light on the existing political biases inherent in these models but also provides a foundation for future research aimed at understanding and mitigating such biases. As LLMs become more integrated into society, ensuring their objectivity and fairness in political discourse remains an essential priority.
