ChartDiff: Benchmark for Comparative Chart Understanding

ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts

Summary: arXiv:2603.28902v1 Announce Type: new

Abstract

Charts are central to analytical reasoning, yet existing benchmarks for chart understanding focus almost exclusively on single-chart interpretation rather than comparative reasoning across multiple charts. To address this gap, we introduce ChartDiff, the first large-scale benchmark for cross-chart comparative summarization.

Overview of ChartDiff

ChartDiff consists of 8,541 chart pairs spanning diverse data sources, chart types, and visual styles. Each pair is annotated with summaries generated by Large Language Models (LLMs) and verified by human annotators. These summaries describe key differences in trends, fluctuations, and anomalies present in the charts.

Evaluation of Models

Using the ChartDiff benchmark, we evaluate a variety of models, including:

General-purpose models
Chart-specialized models
Pipeline-based methods

Our results indicate that frontier general-purpose models achieve the highest quality as measured by GPT-based metrics. In contrast, specialized and pipeline-based methods obtain higher ROUGE scores but tend to perform poorly in human-aligned evaluations. This reveals an important mismatch between lexical overlap and actual summary quality.

Key Findings

Several significant findings emerged from our analysis:

Multi-series charts continue to pose challenges across all model families.
Strong end-to-end models exhibit relative robustness to variations in plotting libraries.
The comparative reasoning inherent in multi-chart analysis remains a significant challenge for current vision-language models.

Implications for Future Research

Our findings position ChartDiff as a critical benchmark for advancing research on multi-chart understanding. As the field of AI continues to evolve, addressing the challenges highlighted by ChartDiff will be essential for improving the capabilities of models in comparative chart reasoning.

Conclusion

In summary, ChartDiff represents a significant step forward in the evaluation of chart comprehension. By providing a large-scale dataset focused on comparative reasoning, we hope to inspire further advancements in AI models that can interpret and summarize complex visual data more effectively.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

ChartDiff: Benchmark for Comparative Chart Understanding

ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts

Abstract

Overview of ChartDiff

Evaluation of Models

Key Findings

Implications for Future Research

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related