LLMs for Text-Based Exploration and Navigation Under Partial Observability
Summary: arXiv:2604.09604v1 Announce Type: new
Abstract: Exploration and goal-directed navigation in unknown layouts are central to inspection, logistics, and search-and-rescue. We ask whether large language models (LLMs) can function as text-only controllers under partial observability — without code execution, tools, or program synthesis.
Introduction
The capacity of large language models (LLMs) to operate as text-only controllers in environments where information is partially visible raises intriguing questions about their applicability in practical scenarios. This study aims to explore whether these models can effectively navigate and explore unknown layouts, which is critical in fields like inspection, logistics, and search-and-rescue operations.
Research Framework
To assess the capabilities of LLMs, we introduce a reproducible benchmark that involves oracle localization in fixed ASCII gridworlds. The experimental setup is designed such that at each step, only a local 5x5 window around the agent is revealed. The model must then select one of the four movement commands: UP, RIGHT, DOWN, or LEFT.
Methodology
The evaluation involves nine contemporary LLMs that include a mix of open and proprietary models, dense and Mixture of Experts configurations, as well as those tuned for instruction versus reasoning. The models are assessed on two distinct tasks across three layouts of increasing complexity:
- Exploration: Aimed at maximizing the number of revealed cells.
- Navigation: Focused on reaching the goal in the shortest possible path.
Results
The outcomes of the experiments are analyzed using various quantitative metrics, including:
- Success Rate: Measures the proportion of successful task completions.
- Efficiency: Evaluated through normalized coverage and path length compared to the oracle.
Additionally, qualitative analysis is conducted to better understand the models’ performance. Notably, reasoning-tuned models demonstrate a reliable ability to complete navigation tasks across all layouts, although they still show less efficiency compared to oracle paths. Few-shot demonstrations in prompts significantly assist these models by minimizing invalid moves and reducing overall path lengths. However, traditional dense instruction models exhibit inconsistent performance.
Observations
Our research also highlights certain action priors, particularly UP and RIGHT, which can inadvertently cause looping behavior under conditions of partial observability. Furthermore, it becomes evident that the training regimen and deliberation processes employed during test time serve as better predictors of control ability than the raw parameter counts of the models.
Conclusion
The findings from this study suggest promising avenues for the practical deployment of LLMs in environments with partial observability. Specifically, the lightweight hybridization of LLMs with classical online planners emerges as a viable strategy for enhancing operational efficiency in partial map systems. This research contributes to understanding the potential and limitations of LLMs in real-world applications where navigation and exploration are essential.
