CoT2-Meta: Budgeted Metacognitive Control for Test-Time Reasoning
Summary: arXiv:2603.28135v1 Announce Type: new
Abstract: Recent test-time reasoning methods have shown improvements in performance by generating more candidate chains or searching over larger reasoning trees. However, these methods often lack explicit control over several critical aspects such as when to expand reasoning paths, what to prune, how to repair errors, and when to abstain from making a decision. In response to these limitations, we introduce CoT2-Meta, a novel training-free metacognitive reasoning framework. This framework combines object-level chain-of-thought generation with meta-level control over partial reasoning trajectories.
Framework Components
CoT2-Meta integrates four essential components to enhance reasoning capabilities:
- Strategy-Conditioned Thought Generation: This component allows the system to generate thoughts based on predefined strategies, improving the relevance of the reasoning process.
- Tree-Structured Search: By employing a tree-structured approach, the framework can navigate through various reasoning paths effectively, optimizing the search for the best solution.
- Online Process Oracle: This oracle evaluates step-level reasoning in real-time, ensuring that each step taken is justified and enhances overall performance.
- Meta-Controller: The meta-controller allocates computational resources by making decisions on expansion, pruning, repair, stopping, and determining fallback options.
Performance Metrics
Under matched inference budgets, CoT2-Meta consistently outperforms several strong baselines, including single-path and sampling-based methods, as well as search-based approaches like ReST-MCTS. The framework has demonstrated impressive results across various benchmarks:
- MATH: 92.8 EM
- GPQA: 90.4 accuracy
- GSM8K: 98.65 EM
- BBEH: 75.8 accuracy
- MMMU-Pro: 85.6 accuracy
- HLE: 48.8 accuracy
With gains over the strongest non-CoT2-Meta baseline of +3.6, +5.2, +1.15, +2.0, +4.3, and +4.3 points, respectively, these results highlight the framework’s effectiveness.
Broader Implications
Beyond these core results, CoT2-Meta remains effective across a broader suite of 15 benchmarks, which includes knowledge and QA tasks, multi-hop reasoning challenges, coding tasks, and out-of-distribution evaluations. Additional analyses have indicated several advantages:
- Better compute scaling
- Improved calibration
- Stronger selective prediction
- Targeted repair behavior
- Consistent gains across different backbone families
These findings suggest that explicit metacognitive control is not only a viable but also a practical design principle for creating reliable and compute-efficient test-time reasoning systems.
