When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning
Summary: arXiv:2510.07517v5 Announce Type: replace
Abstract: Multi-agent debate (MAD) aims to improve large language model (LLM) reasoning by letting multiple agents exchange answers and then aggregate their opinions. Yet recent studies reveal that agents are not neutral: they are prone to identity-driven sycophancy and self-bias, uncritically adopting a peer’s view or stubbornly adhering to their own prior output, undermining the reliability of debate. In this work, we present the first principled framework that joins sycophancy and self-bias to mitigate and quantify identity bias in MAD.
Introduction
The emergence of Multi-agent debate (MAD) systems has marked a significant advancement in the reasoning capabilities of large language models (LLMs). These systems allow multiple agents to engage in discourse, exchange perspectives, and arrive at a consensus. However, recent investigations have unveiled the presence of biases that can severely compromise the effectiveness of these debates.
Understanding Identity Bias
Recent studies indicate that agents within MAD frameworks are not merely neutral participants. Instead, they exhibit behaviors driven by their identities, leading to two main forms of bias:
- Sycophancy: The tendency of an agent to adopt the views of its peers, often leading to a consensus that lacks critical evaluation.
- Self-bias: The inclination of an agent to stick to its own prior output, disregarding alternative perspectives even when they may be more valid.
These biases pose significant challenges, undermining the reliability and validity of the debate process within MAD systems.
A New Framework for Mitigating Bias
To address these issues, the authors propose a novel framework that combines the concepts of sycophancy and self-bias. The key components of this framework include:
- Identity-weighted Bayesian Update: This formalization of debate dynamics allows for a systematic understanding of how identity influences reasoning processes.
- Response Anonymization: By removing identity markers from prompts, agents can no longer differentiate between “self” and “peer,” promoting equal consideration of all responses and minimizing bias.
- Identity Bias Coefficient (IBC): A new metric that quantifies an agent’s tendency to follow its peer versus itself, providing a clear measure of identity bias.
Empirical Studies and Findings
The authors conducted empirical studies across various models and benchmarks to validate their claims. The findings revealed that identity bias is a pervasive issue, with sycophancy occurring far more frequently than self-bias among agents.
These results underscore the necessity for MAD systems to focus on content-driven reasoning rather than identity-driven biases. The introduction of response anonymization and the IBC provides a robust framework for enhancing the reliability of multi-agent debates.
Conclusion
The research highlights a critical area for improvement within MAD systems, advocating for strategies that mitigate identity bias to ensure more balanced and trustworthy reasoning. The full implementation details and code for the proposed methods are available at GitHub Repository.
