Theory of Mind for Human-Agent Collaboration Tasks

Theory of Mind in Action: The Instruction Inference Task in Dynamic Human-Agent Collaboration

Summary: arXiv:2507.02935v3 Announce Type: replace-cross

Introduction

Successful human-agent collaboration hinges on an agent’s ability to comprehend instructions from a human principal. However, instructions are often incomplete or ambiguous, necessitating the agent to deduce unspoken intentions from the shared context. This process involves the application of Theory of Mind (ToM), enabling the agent to infer the mental states of its human counterpart. This article discusses the implications of this concept and introduces a novel task called Instruction Inference.

The Instruction Inference Task

The Instruction Inference task is designed to evaluate ToM within a dynamic and collaborative environment. In this task, an agent aids a principal in achieving a specific goal by interpreting incomplete or ambiguous instructions. This capability is critical for effective human-agent collaboration, especially in scenarios where clear communication is not possible.

Introducing Tomcat

We present Tomcat, a large language model (LLM)-based agent crafted to demonstrate ToM reasoning in interpreting and responding to instructions from a principal. Tomcat is implemented in two distinct variants:

Fs-CoT: This variant stands for few-shot chain-of-thought, utilizing a limited number of examples to showcase the necessary structured reasoning.
CP: The commonsense prompt variant relies on commonsense knowledge and contextual information related to the problem at hand.

Implementation of Tomcat

Both variants of Tomcat have been implemented using three leading LLMs: GPT-4o, DeepSeek-R1, and Gemma-3-27B. This diversity in implementation allows us to assess the effectiveness of each model in performing the Instruction Inference task.

Research Methodology

To evaluate Tomcat’s capabilities, we conducted a study involving 52 human participants. Participants were provided with the same information as the CP variant of Tomcat. We measured the effectiveness of the models using three key metrics:

Intent Accuracy: This metric assesses how accurately Tomcat and the human participants could identify the intended actions based on the given instructions.
Action Optimality: This evaluates the efficiency of the actions taken by both the agent and the human participants in reaching the goal.
Planning Optimality: This metric measures the effectiveness of the plans developed by Tomcat and the human participants in achieving the desired outcome.

Results and Discussion

The findings reveal that Tomcat, particularly in the Fs-CoT variant with GPT-4o and DeepSeek-R1, achieved performance levels comparable to those of human participants. This suggests a significant potential for ToM in enhancing human-agent collaboration. The ability of Tomcat to interpret ambiguous instructions accurately underscores the importance of developing agents capable of understanding human intentions.

Conclusion

The exploration of ToM through the Instruction Inference task highlights the growing capabilities of LLM-based agents in collaborative environments. As we continue to develop and refine these technologies, the potential for effective human-agent teamwork will only increase, paving the way for more intuitive and adaptive interactions.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Theory of Mind for Human-Agent Collaboration Tasks

Theory of Mind in Action: The Instruction Inference Task in Dynamic Human-Agent Collaboration

Introduction

The Instruction Inference Task

Introducing Tomcat

Implementation of Tomcat

Research Methodology

Results and Discussion

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related