ReactBench: Benchmarking Structural Reasoning in MLLMs

Date:

ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams

Summary: arXiv:2604.15994v1 Announce Type: new

Abstract: Multimodal Large Language Models (MLLMs) excel at recognizing individual visual elements and reasoning over simple linear diagrams. However, when faced with complex topological structures involving branching paths, converging flows, and cyclic dependencies, their reasoning capabilities degrade sharply, even on tasks as basic as counting endpoints. Existing benchmarks fail to probe this gap, focusing on semantic comprehension rather than structural reasoning. We introduce ReactBench, a benchmark that reveals fundamental limitations in structural reasoning through chemical reaction diagrams.

These real-world scientific diagrams offer an ideal testbed because they naturally span diverse structures from linear chains to cyclic graphs, while requiring both precise local recognition and coherent global reasoning. Our benchmark comprises 1,618 expert-annotated QA pairs across four hierarchical task dimensions.

Key Findings

Extensive evaluation across 17 MLLMs reveals a significant performance gap exceeding 30% between anchor-based tasks and holistic structural reasoning tasks. Controlled ablations confirm this bottleneck lies in reasoning, not perception. These findings expose a fundamental deficit in structural understanding and establish directions for advancing visual reasoning.

Introduction to ReactBench

ReactBench is designed to address the limitations of existing benchmarks that primarily focus on semantic comprehension. Unlike traditional benchmarks, ReactBench emphasizes the need for structural reasoning, particularly in the context of chemical reaction diagrams.

Why Chemical Reaction Diagrams?

Chemical reaction diagrams are inherently complex, featuring a variety of structures that challenge the reasoning capabilities of MLLMs. These structures include:

  • Linear chains
  • Cyclic graphs
  • Branching paths
  • Converging flows

Each of these elements requires not only the ability to identify individual components but also to understand their interrelationships within a broader context.

Benchmark Composition

ReactBench consists of 1,618 expert-annotated question-answer pairs that have been categorized across four hierarchical task dimensions:

  • Basic recognition tasks
  • Intermediate reasoning tasks
  • Complex structural reasoning tasks
  • Holistic understanding tasks

Evaluation Results

In our extensive evaluation, we observed that MLLMs performed significantly better on anchor-based tasks compared to holistic structural reasoning tasks. The performance gap, exceeding 30%, indicates a pressing need for improved methodologies to enhance structural understanding in MLLMs.

Conclusion

ReactBench not only highlights the limitations of current MLLMs in structural reasoning but also sets the stage for future research aimed at bridging this gap. By focusing on complex topological structures within chemical reaction diagrams, we aim to advance the field of visual reasoning and improve the capabilities of MLLMs in understanding complex scientific data.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.