Identity as Attractor: Geometric Evidence for Persistent Agent Architecture in LLM Activation Space
Summary: arXiv:2604.12016v1 Announce Type: new
Abstract: Large language models map semantically related prompts to similar internal representations — a phenomenon interpretable as attractor-like dynamics. We ask whether the identity document of a persistent cognitive agent (its cognitive_core) exhibits analogous attractor-like behavior. We present a controlled experiment on Llama 3.1 8B Instruct, comparing hidden states of an original cognitive_core (Condition A), seven paraphrases (Condition B), and seven structurally matched controls (Condition C). Mean-pooled states at layers 8, 16, and 24 show that paraphrases converge to a tighter cluster than controls (Cohen’s d > 1.88, p < 10^{-27}, Bonferroni-corrected). Replication on Gemma 2 9B confirms cross-architecture generalizability. Ablations suggest the effect is primarily semantic rather than structural, and that structural completeness appears necessary to reach the attractor region. An exploratory experiment shows that reading a scientific description of the agent shifts internal state toward the attractor -- closer than a sham preprint -- distinguishing knowing about an identity from operating as that identity. These results provide representational evidence that agent identity documents induce attractor-like geometry in LLM activation space.
Introduction
The exploration of identity in large language models (LLMs) has gained significant attention in recent years. Particularly, the study of how these models respond to different prompts and their internal representation dynamics has opened new avenues for understanding cognitive architectures. This article discusses a recent paper that introduces a novel perspective on the cognitive architecture of LLMs, focusing on the concept of identity as an attractor within their activation space.
Research Overview
The researchers conducted a controlled experiment utilizing Llama 3.1 8B Instruct, comparing three different conditions:
- Condition A: Original cognitive_core.
- Condition B: Seven paraphrases of the cognitive_core.
- Condition C: Seven structurally matched controls.
The primary aim was to assess whether the identity document of a cognitive agent exhibits attractor-like behavior akin to that seen in LLM activation spaces. The results were significant, indicating that paraphrases converge to a tighter cluster compared to controls, with Cohen’s d values exceeding 1.88 and a p-value less than 10^{-27}, even after Bonferroni correction.
Key Findings
Further analysis revealed:
- A replication study on Gemma 2 9B also confirmed the findings, suggesting that this behavior is not limited to one specific architecture.
- Ablation studies indicated that the observed effects are primarily semantic rather than structural, highlighting the importance of semantic coherence in achieving attractor-like dynamics.
- Structural completeness was found to be necessary for reaching the attractor region, emphasizing the role of identity in shaping internal representations.
Exploratory Experiment
An exploratory experiment further substantiated these findings by demonstrating that reading a scientific description of the agent led to a shift in the internal state toward the attractor. This result underscores the distinction between merely knowing about an identity and functioning as that identity, shedding light on the cognitive processes within LLMs.
Conclusion
The research presents compelling evidence that agent identity documents can induce attractor-like geometry in LLM activation space. These findings contribute to the broader understanding of cognitive architectures in artificial intelligence and open new pathways for future research in the field.
