EgoEsportsQA: An Egocentric Video Benchmark for Perception and Reasoning in Esports
Summary: arXiv:2604.12320v1 Announce Type: cross
As the landscape of artificial intelligence continues to evolve, the capabilities of video large language models (Video-LLMs) have become a focal point of research. While these models have demonstrated significant proficiency in analyzing slow-paced, real-world egocentric videos, their performance in the fast-paced, information-rich environment of esports remains largely uncharted. To address this shortcoming, researchers have introduced EgoEsportsQA, a groundbreaking video question-answering (QA) benchmark designed to enhance both perception and reasoning within expert esports contexts.
The Need for a Specialized Benchmark
Current benchmarks primarily focus on everyday activities and scenarios, creating a void when it comes to evaluating cognitive reasoning in the dynamic settings of esports. EgoEsportsQA aims to bridge this gap by providing a rigorous framework that tests the capabilities of Video-LLMs in high-velocity virtual environments.
Key Features of EgoEsportsQA
This innovative benchmark comprises 1,745 meticulously curated QA pairs sourced from professional matches across three popular first-person shooter games. The questions are structured within a comprehensive two-dimensional taxonomy:
- Cognitive Capability Dimension: 11 sub-tasks that encompass various levels of perception and reasoning.
- Esports Knowledge Dimension: 6 sub-tasks focusing on the specialized knowledge required in competitive gaming.
Evaluating Video-LLMs
Comprehensive evaluations were conducted on state-of-the-art Video-LLMs, revealing that even the most advanced models achieved only a 71.58% performance rate. This result underscores significant deficiencies in the models’ capabilities:
- Stronger performance in basic visual perception compared to deep tactical reasoning.
- Better understanding of macro-progression over fine-grained micro-operations.
Insights and Future Directions
Extensive ablation experiments have highlighted intrinsic weaknesses within current Video-LLM architectures. Notably, the EgoEsportsQA dataset serves as a crucial tool for uncovering relationships between real-world and virtual egocentric domains. This connection not only aids in understanding the limitations of existing models but also provides a roadmap for optimizing future esports applications.
As the field of AI continues to advance, the development of specialized benchmarks like EgoEsportsQA will be instrumental in driving the progress of Video-LLMs, ensuring they can address the complexities of various egocentric environments effectively.
