Council Mode: Mitigating Hallucination and Bias in LLMs via Multi-Agent Consensus
Summary: arXiv:2604.02923v1 Announce Type: cross
Large Language Models (LLMs), particularly those employing Mixture-of-Experts (MoE) architectures, have achieved remarkable capabilities across diverse natural language processing tasks. However, these models frequently suffer from hallucinations—generating plausible but factually incorrect content—and exhibit systematic biases that are amplified by uneven expert activation during inference.
Introduction
In recent years, the advancements in LLMs have revolutionized the field of natural language processing, enabling applications ranging from chatbots to content generation. Despite their impressive capabilities, these models are not without flaws. One of the most pressing issues is the phenomenon known as hallucination, where the models produce outputs that may sound coherent but lack factual accuracy. Moreover, biases inherent in training data can lead to skewed results, further complicating the reliability of these systems.
The Council Mode Framework
To address these limitations, we propose the Council Mode, a novel multi-agent consensus framework that dispatches queries to multiple heterogeneous frontier LLMs in parallel and synthesizes their outputs through a dedicated consensus model. This approach aims to enhance the accuracy and fairness of responses generated by LLMs.
Operational Phases
The Council pipeline operates in three distinct phases:
- Intelligent Triage Classifier: This component routes queries based on complexity, ensuring that each question is directed to the most suitable model for processing.
- Parallel Expert Generation: In this phase, architecturally diverse models generate responses simultaneously, leveraging their unique strengths to provide a more robust output.
- Structured Consensus Synthesis: The final phase involves a consensus mechanism that explicitly identifies agreement, disagreement, and unique findings among the outputs before producing the final response.
Evaluation and Results
We implemented and evaluated the Council Mode architecture within an open-source AI workspace. Our comprehensive evaluation across multiple benchmarks demonstrates the effectiveness of this approach:
- A 35.9% relative reduction in hallucination rates on the HaluEval benchmark.
- A 7.8-point improvement on the TruthfulQA benchmark compared to the best-performing individual model.
- Significantly lower bias variance across various domains.
Conclusion
The Council Mode framework represents a significant advancement in the quest to mitigate hallucinations and biases in LLMs. By employing a multi-agent consensus approach, we can enhance the reliability and accuracy of AI-generated content. Our findings indicate that this method not only improves factual accuracy but also promotes fairness across different domains. We provide the mathematical formulation of the consensus mechanism, detail the system architecture, and present extensive empirical results with ablation studies, paving the way for future research in this vital area of AI development.
