AIDABench: AI Data Analytics Benchmark
Summary: arXiv:2603.15636v2 Announce Type: replace
As AI-driven document understanding and processing tools become increasingly prevalent in real-world applications, the need for rigorous evaluation standards has grown increasingly urgent. Existing benchmarks and evaluations often focus on isolated capabilities or simplified scenarios, failing to capture the end-to-end task effectiveness required in practical settings.
To address this gap, we introduce AIDABench, a comprehensive benchmark for evaluating AI systems on complex data analytics tasks in an end-to-end manner. AIDABench encompasses over 600 diverse document analysis tasks across three core capability dimensions:
- Question Answering
- Data Visualization
- File Generation
These tasks are grounded in realistic scenarios involving heterogeneous data types, including spreadsheets, databases, financial reports, and operational records. They reflect analytical demands across diverse industries and job functions.
Notably, the tasks in AIDABench are sufficiently challenging that even human experts require 1-2 hours per question when assisted by AI tools. This underscores the benchmark’s difficulty and real-world complexity.
We evaluate 11 state-of-the-art models on AIDABench, spanning both proprietary (e.g., Claude Sonnet 4.5, Gemini 3 Pro Preview) and open-source (e.g., Qwen3-Max-2026-01-23-Thinking) families. Our results reveal that complex, real-world data analytics tasks remain a significant challenge for current AI systems, with the best-performing model achieving only 59.43% pass-at-1.
Our detailed analysis of failure modes across each capability dimension identifies key challenges for future research:
- Inadequate understanding of context in document interpretation.
- Limitations in visual representation of complex data.
- Challenges in generating coherent and accurate file outputs.
AIDABench offers a principled reference for enterprise procurement, tool selection, and model optimization. It serves as a vital resource for researchers, developers, and organizations seeking to enhance their AI-driven data analytics capabilities.
For those interested in exploring this benchmark further, AIDABench is publicly available at https://github.com/MichaelYang-lyx/AIDABench.
