Probing Spatial Reasoning in LLMs: From Cognition to Neural Data

From Human Cognition to Neural Activations: Probing the Computational Primitives of Spatial Reasoning in LLMs

Summary: arXiv:2603.26323v1 Announce Type: cross

Abstract

As spatial intelligence becomes an increasingly important capability for foundation models, it remains unclear whether large language models’ (LLMs) performance on spatial reasoning benchmarks reflects structured internal spatial representations or reliance on linguistic heuristics. We address this question from a mechanistic perspective by examining how spatial information is internally represented and used.

Introduction

Understanding spatial reasoning in large language models is critical as these models become integral to various applications, from robotics to virtual assistants. The question arises: do LLMs possess the capability to internally represent spatial relationships, or do they simply use language-based heuristics to navigate spatial reasoning tasks?

Methodology

To investigate this, we draw on computational theories of human spatial cognition and decompose spatial reasoning into three distinct primitives:

Relational Composition: The ability to understand and manipulate relationships between objects.
Representational Transformation: The capability to change the representation of spatial information.
Stateful Spatial Updating: The process of updating spatial information based on new inputs or changes.

We designed controlled task families for each primitive and evaluated multilingual LLMs in English, Chinese, and Arabic. This evaluation was conducted under single-pass inference to ensure consistency in our results.

Analysis Techniques

To analyze the internal representations of the LLMs, we employed several techniques:

Linear Probing: A method used to assess the correlation between internal representations and task performance.
Sparse Autoencoder Based Feature Analysis: This technique helps in identifying the important features encoded within the model.
Causal Interventions: Evaluating how changes in certain inputs affect the model’s output, providing insight into the underlying mechanisms of spatial reasoning.

Findings

Our findings indicate that task-relevant spatial information is indeed encoded in the intermediate layers of the LLMs and can causally influence model behavior. However, we observed that these representations are:

Transient: They do not persist across different tasks.
Fragmented: Representations vary significantly across different task families.
Weakly Integrated: They do not strongly contribute to final predictions.

Cross-Linguistic Analysis

Further analysis revealed mechanistic degeneracy across languages; similar behavioral performance was attained through distinct internal pathways, suggesting that the models may not generalize well across different linguistic contexts.

Conclusion

Overall, our results point to the conclusion that current LLMs exhibit limited and context-dependent spatial representations. This finding emphasizes the need for more mechanistic evaluations that go beyond mere benchmark accuracy to truly understand the spatial reasoning capabilities of these complex models.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Probing Spatial Reasoning in LLMs: From Cognition to Neural Data

From Human Cognition to Neural Activations: Probing the Computational Primitives of Spatial Reasoning in LLMs

Abstract

Introduction

Methodology

Analysis Techniques

Findings

Cross-Linguistic Analysis

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related