Can LLMs Model Spatial Worlds? Maze Task Insights

Do LLMs Build Spatial World Models? Evidence from Grid-World Maze Tasks

Summary: arXiv:2604.10690v1 Announce Type: new

Abstract: Foundation models have shown remarkable performance across diverse tasks, yet their ability to construct internal spatial world models for reasoning and planning remains unclear. We systematically evaluate the spatial understanding of large language models through maze tasks, a controlled testing context requiring multi-step planning and spatial abstraction.

Across comprehensive experiments with Gemini-2.5-Flash, GPT-5-mini, Claude-Haiku-4.5, and DeepSeek-Chat, we uncover significant discrepancies in spatial reasoning that challenge assumptions about LLM planning capabilities.

Key Findings

Performance Discrepancies: Using chain-of-thought prompting, Gemini achieves 80-86% accuracy on smaller mazes (5×5 to 7×7 grids) with tokenized adjacency representations.
Collapse in Performance: The performance drops dramatically to 16-34% with visual grid formats, revealing a 2-5x difference and suggesting representation-dependent rather than format-invariant spatial reasoning.

Further Analysis

To probe deeper into spatial understanding, we employed sequential proximity questions and compositional distance comparisons. Despite achieving an impressive 96-99% semantic coverage in reasoning traces, the models struggled to leverage this understanding for consistent spatial computations.

Independent Question Treatment

Our analysis indicates that the models tend to treat each question independently, failing to build cumulative spatial knowledge. This limitation raises critical questions about the robustness of LLMs in developing effective spatial world models.

Implications

The findings from our maze-solving tasks suggest that large language models do not exhibit the ability to develop robust spatial world models. Instead, they demonstrate representation-specific and prompting-dependent reasoning, which is successful only under narrow conditions.

Conclusion

These results have significant implications for the deployment of foundation models in applications that require spatial abstraction. As the capabilities of large language models continue to evolve, understanding their limitations in spatial reasoning will be crucial for their effective application in real-world scenarios.

Future Directions

Future research should consider enhancing the spatial reasoning capabilities of LLMs through improved training methodologies and representation techniques. Exploring alternative approaches to spatial abstraction may also yield valuable insights for the development of more versatile AI systems.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Can LLMs Model Spatial Worlds? Maze Task Insights

Do LLMs Build Spatial World Models? Evidence from Grid-World Maze Tasks

Key Findings

Further Analysis

Independent Question Treatment

Implications

Conclusion

Future Directions

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related