Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games
Summary: arXiv:2604.11741v1 Announce Type: new
Abstract: Vision-language models (VLMs) have shown impressive capabilities in perceptual tasks, yet they degrade in complex multi-hop reasoning under multiplayer game settings with imperfect and deceptive information. In this paper, we study a representative multiplayer task, Murder Mystery Games, which require inferring hidden truths based on partial clues provided by roles with different intentions.
To address this challenge, we propose a collaborative multi-agent framework for evaluating and synthesizing high-quality, role-driven multiplayer game scripts, enabling fine-grained interaction patterns tailored to character identities (i.e., murderer vs. innocent). Our system generates rich multimodal contexts, including character backstories, visual and textual clues, and multi-hop reasoning chains, through coordinated agent interactions.
Key Features of the Proposed Framework
- Multi-Agent Collaboration: Our framework utilizes multiple agents that work together to create and evaluate game scripts, enhancing the overall reasoning capabilities of the VLMs.
- Role-Driven Interaction: The agents are designed to reflect different character roles within the game, allowing for interactions that are more realistic and contextually relevant.
- Rich Multimodal Contexts: The framework generates detailed backstories and clues, integrating both visual and textual elements to ensure a comprehensive understanding of the game’s narrative.
Training Strategy
We design a two-stage agent-monitored training strategy to enhance the reasoning ability of VLMs:
- Chain-of-Thought Based Fine-Tuning: This involves training on curated and synthetic datasets that model uncertainty and deception, allowing agents to better understand complex scenarios.
- GRPO-Based Reinforcement Learning: Through this method, we implement agent-monitored reward shaping, encouraging the model to develop character-specific reasoning behaviors and effective multimodal multi-hop inference.
Experimental Results
Extensive experiments demonstrate that our method significantly boosts the performance of VLMs in several critical areas:
- Narrative reasoning
- Hidden fact extraction
- Deception-resilient understanding
Conclusion
Our contributions offer a scalable solution for training and evaluating VLMs under uncertain, adversarial, and socially complex conditions. This work lays the groundwork for future benchmarks in multimodal multi-hop reasoning under imperfect information, promising to enhance the capabilities of AI systems in interactive and narrative-driven environments.
