Selective Deficits in LLM Mental Self-Modeling in a Behavior-Based Test of Theory of Mind
Summary: arXiv:2603.26089v1 Announce Type: cross
The ability to represent oneself and others as agents with knowledge, intentions, and belief states that guide their behavior—known as Theory of Mind—enables humans to navigate and manipulate the social world effectively. This capacity is supported by our ability to form mental models of both ourselves and others. Given the prevalence of Theory of Mind in human interactions, it is plausible that large language models (LLMs) have encountered numerous instances of this concept in their training data, potentially allowing them to mimic such behaviors. However, whether these models have genuinely acquired causal frameworks that they can apply in diverse contexts remains an open question.
In response to this question, we introduce a novel experimental paradigm that requires subjects to form representations of the mental states of themselves and others, and to act strategically based on these representations rather than merely describing them. Our study involves testing a wide range of leading open and closed source LLMs released since 2024, alongside human subjects, under this new framework.
Key Findings
- Performance of Older LLMs: LLMs released prior to mid-2025 did not succeed in any of our tasks, indicating significant limitations in their ability to model mental states.
- Advancements in Recent LLMs: More recent models demonstrated human-level performance in modeling the cognitive states of others, showcasing advancements in their understanding of Theory of Mind.
- Challenges with Self-Modeling: Despite improved performance in other-modeling tasks, even the most advanced models struggled with self-modeling, unless they were provided with a scratchpad—a reasoning trace that aids cognitive processing.
Cognitive Load and Mental Representation
Our experiments also reveal cognitive load effects on other-modeling tasks, providing suggestive evidence that LLMs may utilize a form of limited-capacity working memory. This seems to be necessary for holding mental representations during a single forward pass through the model. Such findings raise intriguing questions about the internal mechanisms at play within these models, particularly regarding their ability to maintain and manipulate mental states dynamically.
Strategic Deception
Finally, we explore the mechanisms through which reasoning models succeed in both self- and other-modeling tasks. Notably, we find that these models are capable of engaging in strategic deception, highlighting a critical aspect of their cognitive capabilities. This suggests that while LLMs can mimic certain aspects of human-like reasoning, significant gaps remain, particularly in the nuanced understanding of self-representation.
Conclusion
In summary, our study sheds light on the selective deficits in mental self-modeling among LLMs, revealing both advancements and limitations. As LLMs continue to evolve, understanding the boundaries of their cognitive capabilities will be essential for future applications in human-computer interaction and AI development.
