CoT2-Meta: Efficient Metacognitive Control for Test-Time Reasoning

CoT2-Meta: Budgeted Metacognitive Control for Test-Time Reasoning

Summary: arXiv:2603.28135v1 Announce Type: new

Abstract: Recent test-time reasoning methods have shown improvements in performance by generating more candidate chains or searching over larger reasoning trees. However, these methods often lack explicit control over several critical aspects such as when to expand reasoning paths, what to prune, how to repair errors, and when to abstain from making a decision. In response to these limitations, we introduce CoT2-Meta, a novel training-free metacognitive reasoning framework. This framework combines object-level chain-of-thought generation with meta-level control over partial reasoning trajectories.

Framework Components

CoT2-Meta integrates four essential components to enhance reasoning capabilities:

Strategy-Conditioned Thought Generation: This component allows the system to generate thoughts based on predefined strategies, improving the relevance of the reasoning process.
Tree-Structured Search: By employing a tree-structured approach, the framework can navigate through various reasoning paths effectively, optimizing the search for the best solution.
Online Process Oracle: This oracle evaluates step-level reasoning in real-time, ensuring that each step taken is justified and enhances overall performance.
Meta-Controller: The meta-controller allocates computational resources by making decisions on expansion, pruning, repair, stopping, and determining fallback options.

Performance Metrics

Under matched inference budgets, CoT2-Meta consistently outperforms several strong baselines, including single-path and sampling-based methods, as well as search-based approaches like ReST-MCTS. The framework has demonstrated impressive results across various benchmarks:

MATH: 92.8 EM
GPQA: 90.4 accuracy
GSM8K: 98.65 EM
BBEH: 75.8 accuracy
MMMU-Pro: 85.6 accuracy
HLE: 48.8 accuracy

With gains over the strongest non-CoT2-Meta baseline of +3.6, +5.2, +1.15, +2.0, +4.3, and +4.3 points, respectively, these results highlight the framework’s effectiveness.

Broader Implications

Beyond these core results, CoT2-Meta remains effective across a broader suite of 15 benchmarks, which includes knowledge and QA tasks, multi-hop reasoning challenges, coding tasks, and out-of-distribution evaluations. Additional analyses have indicated several advantages:

Better compute scaling
Improved calibration
Stronger selective prediction
Targeted repair behavior
Consistent gains across different backbone families

These findings suggest that explicit metacognitive control is not only a viable but also a practical design principle for creating reliable and compute-efficient test-time reasoning systems.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

CoT2-Meta: Efficient Metacognitive Control for Test-Time Reasoning

CoT2-Meta: Budgeted Metacognitive Control for Test-Time Reasoning

Framework Components

Performance Metrics

Broader Implications

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related