EVGeoQA: Benchmarking LLMs on Dynamic, Multi-Objective Geo-Spatial Exploration
Summary: arXiv:2604.07070v1 Announce Type: new
Introduction
Recent advancements in Large Language Models (LLMs) have showcased their exceptional reasoning capabilities; however, their application in dynamic geo-spatial environments is still a largely unexplored area. Traditional Geo-Spatial Question Answering (GSQA) benchmarks have primarily focused on static retrieval methods, which inadequately reflect the complexities involved in real-world planning scenarios. This article introduces EVGeoQA, a groundbreaking benchmark that addresses these limitations by focusing on Electric Vehicle (EV) charging scenarios with a unique dual-objective and location-anchored design.
Understanding EVGeoQA
EVGeoQA is designed to facilitate a more robust evaluation of LLMs in geo-spatial contexts. The benchmark is characterized by:
- Dynamic Queries: Each query is explicitly tied to a user’s real-time geographical coordinates.
- Dual Objectives: The benchmark integrates two critical objectives: the necessity for vehicle charging and the preference for co-located activities.
The GeoRover Evaluation Framework
To effectively assess the performance of LLMs in this complex setting, we introduce GeoRover, a comprehensive evaluation framework. This framework utilizes a tool-augmented agent architecture that allows for a systematic evaluation of the LLMs’ capabilities in:
- Dynamic exploration of geo-spatial environments.
- Addressing multi-objective tasks where multiple goals must be satisfied simultaneously.
Key Findings from Experiments
The results from our experiments indicate a mixed performance from the LLMs. While they are proficient at utilizing tools to tackle sub-tasks, they exhibit challenges when it comes to long-range spatial exploration. Moreover, an interesting emergent capability was observed:
- LLMs demonstrated the ability to summarize historical exploration trajectories, which significantly improved their efficiency in exploration tasks.
Conclusion
EVGeoQA represents a significant advancement in the evaluation of geo-spatial intelligence in LLMs. By focusing on real-time dynamics and multi-objective planning, it provides a challenging testbed for future research in this domain. The dataset and prompts associated with EVGeoQA are publicly available for researchers and developers at https://github.com/Hapluckyy/EVGeoQA/, allowing for broader experimentation and improvement in geo-spatial AI applications.
