Bayesian Social Deduction Using Graph-Based Language Models

Date:

Bayesian Social Deduction with Graph-Informed Language Models

Summary: arXiv:2506.17788v2 Announce Type: replace

Abstract: Social reasoning – inferring unobservable beliefs and intentions from partial observations of other agents – remains a challenging task for large language models (LLMs). We evaluate the limits of current reasoning language models in the social deduction game Avalon and find that while the largest models demonstrate strong performance, they require extensive test-time inference and degrade sharply when distilled to smaller, real-time-capable variants. To address this, we introduce a hybrid reasoning framework that externalizes belief inference to a structured probabilistic model, while using an LLM for language understanding and interaction. Our approach achieves competitive performance with much larger models in Agent-Agent play and, notably, is the first language agent to defeat human players in a controlled study – achieving a 67% win rate and receiving higher qualitative ratings than both reasoning baselines and human teammates. We release code, models, and a dataset to support future work on social reasoning in LLM agents, which can be found at https://camp-lab-purdue.github.io/bayesian-social-deduction/.

Introduction

In recent years, the field of artificial intelligence has made significant strides, particularly in the development of large language models (LLMs). These models have shown remarkable capabilities in understanding and generating human-like text. However, a key area where they still struggle is in social reasoning—specifically, the ability to infer beliefs and intentions from limited observations of others.

Challenges in Social Reasoning

To investigate the effectiveness of existing reasoning models, researchers assessed their performance in the social deduction game Avalon, a game that inherently requires players to deduce the motivations and intentions of others. The findings indicated that although the largest LLMs performed well, they faced significant limitations:

  • Extensive test-time inference was necessary for optimal performance.
  • Performance degraded sharply when models were distilled to smaller versions, which are more suitable for real-time applications.

A Hybrid Reasoning Framework

In response to these challenges, the researchers introduced a novel hybrid reasoning framework. This approach separates belief inference from the language understanding task:

  • A structured probabilistic model is employed for belief inference.
  • An LLM is utilized for language understanding and interaction.

This innovative combination has led to competitive performance in agent-agent play scenarios, marking a significant advancement in the capabilities of language agents.

Achievements and Future Work

One of the most notable achievements of this research is that the newly developed language agent has successfully defeated human players in controlled studies. It achieved a remarkable 67% win rate, outperforming both reasoning baselines and even human teammates in qualitative assessments.

To support ongoing research in this area, the authors have made their code, models, and dataset publicly available. This resource is expected to facilitate further advancements in social reasoning within LLM agents.

Conclusion

The integration of Bayesian social deduction techniques with graph-informed language models represents a promising direction for improving social reasoning in AI. As researchers continue to explore this domain, we can anticipate significant advancements that could enhance how AI systems understand and interact in complex social environments.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.