LLM Theory of Mind: Deficits in Mental Self-Modeling Revealed

Selective Deficits in LLM Mental Self-Modeling in a Behavior-Based Test of Theory of Mind

Summary: arXiv:2603.26089v1 Announce Type: cross

The ability to represent oneself and others as agents with knowledge, intentions, and belief states that guide their behavior—known as Theory of Mind—enables humans to navigate and manipulate the social world effectively. This capacity is supported by our ability to form mental models of both ourselves and others. Given the prevalence of Theory of Mind in human interactions, it is plausible that large language models (LLMs) have encountered numerous instances of this concept in their training data, potentially allowing them to mimic such behaviors. However, whether these models have genuinely acquired causal frameworks that they can apply in diverse contexts remains an open question.

In response to this question, we introduce a novel experimental paradigm that requires subjects to form representations of the mental states of themselves and others, and to act strategically based on these representations rather than merely describing them. Our study involves testing a wide range of leading open and closed source LLMs released since 2024, alongside human subjects, under this new framework.

Key Findings

Performance of Older LLMs: LLMs released prior to mid-2025 did not succeed in any of our tasks, indicating significant limitations in their ability to model mental states.
Advancements in Recent LLMs: More recent models demonstrated human-level performance in modeling the cognitive states of others, showcasing advancements in their understanding of Theory of Mind.
Challenges with Self-Modeling: Despite improved performance in other-modeling tasks, even the most advanced models struggled with self-modeling, unless they were provided with a scratchpad—a reasoning trace that aids cognitive processing.

Cognitive Load and Mental Representation

Our experiments also reveal cognitive load effects on other-modeling tasks, providing suggestive evidence that LLMs may utilize a form of limited-capacity working memory. This seems to be necessary for holding mental representations during a single forward pass through the model. Such findings raise intriguing questions about the internal mechanisms at play within these models, particularly regarding their ability to maintain and manipulate mental states dynamically.

Strategic Deception

Finally, we explore the mechanisms through which reasoning models succeed in both self- and other-modeling tasks. Notably, we find that these models are capable of engaging in strategic deception, highlighting a critical aspect of their cognitive capabilities. This suggests that while LLMs can mimic certain aspects of human-like reasoning, significant gaps remain, particularly in the nuanced understanding of self-representation.

Conclusion

In summary, our study sheds light on the selective deficits in mental self-modeling among LLMs, revealing both advancements and limitations. As LLMs continue to evolve, understanding the boundaries of their cognitive capabilities will be essential for future applications in human-computer interaction and AI development.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

LLM Theory of Mind: Deficits in Mental Self-Modeling Revealed

Selective Deficits in LLM Mental Self-Modeling in a Behavior-Based Test of Theory of Mind

Key Findings

Cognitive Load and Mental Representation

Strategic Deception

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related