Multi Chain-of-Thought Voting for Geometric Reasoning AI

Date:

Beyond Symbolic Solving: Multi Chain-of-Thought Voting for Geometric Reasoning in Large Language Models

Summary: arXiv:2604.00890v1 Announce Type: new

Abstract

Geometric Problem Solving (GPS) remains at the heart of enhancing mathematical reasoning in large language models because it requires the combination of diagrammatic understanding, symbolic manipulation, and logical inference. In existing literature, researchers have chiefly focused on synchronizing the diagram descriptions with text literals and solving the problem. In this vein, they have either taken a neural, symbolic, or neuro-symbolic approach. But this solves only the first two of the requirements, namely diagrammatic understanding and symbolic manipulation, while leaving logical inference underdeveloped.

The logical inference is often limited to one chain-of-thought (CoT). To address this weakness in hitherto existing models, this paper proposes MARS-GPS, which generates multiple parallel reasoning rollouts augmented with Python code execution for numerical verification. It ranks them using token-level entropy as a confidence signal and aggregates answers through a multi-stage voting and self-verification pipeline.

Key Contributions of MARS-GPS

  • Multiple Reasoning Rollouts: The model generates up to 16 parallel reasoning paths, which enhances the depth and breadth of logical inference.
  • Numerical Verification: Each reasoning rollout is augmented with Python code execution that serves to verify numerical solutions, increasing reliability.
  • Confidence Ranking: The use of token-level entropy allows for a more nuanced confidence measure in the generated answers.
  • Multi-Stage Voting: Answers are aggregated through a sophisticated voting mechanism that improves accuracy and consistency of results.

Empirical Results

Empirical results show that MARS-GPS with eight parallel rollouts achieves an impressive accuracy of 88.8% on the Geometry3K benchmark. This represents a nearly 11% improvement over the previous state-of-the-art models. Moreover, the accuracy of MARS-GPS scales consistently as the number of rollouts increases; for instance, an increase from one to sixteen rollouts results in a 6.0% improvement on the ablation subset.

Conclusion

The advancements presented in MARS-GPS showcase a significant leap forward in the field of geometric problem solving within large language models. By effectively integrating multiple reasoning paths and improving logical inference through a robust verification process, MARS-GPS addresses critical gaps in existing methods. Researchers and practitioners are encouraged to explore these findings further, as the code and data are available in an anonymous repository: MARS-GPS Repository.

Future Work

Looking ahead, the authors suggest that further research should focus on:

  • Expanding the range of geometric problems tackled by MARS-GPS.
  • Investigating the application of the multi-chain-of-thought approach in other domains of mathematical reasoning.
  • Enhancing the efficiency of the multi-stage voting mechanism to accommodate even larger datasets.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.