Chain-in-Tree: Back to Sequential Reasoning in LLM Tree Search
Summary: arXiv:2509.25835v4 Announce Type: replace
Abstract: Test-time scaling improves large language models (LLMs) on long-horizon reasoning tasks by allocating more compute at inference. LLM inference via tree search (LITS) achieves strong performance but is highly inefficient. We propose Chain-in-Tree (CiT), a plug-in framework that decides when to branch during search instead of expanding at every step. CiT introduces lightweight Branching Necessity (BN) evaluations, including BN-DP (direct prompting) and BN-SC (self-consistency). Integrated into Tree of Thoughts, ReST-MCTS, and RAP, BN-DP reduces token generation, model calls, and runtime by 75-85% on GSM8K and Math500, with often negligible or no accuracy loss. BN-SC typically yields substantial savings (up to 80%) generally but shows instability in 1-4 out of 14 settings, caused by a small subset of examples that produce extremely long reasoning steps. We theoretically prove that BN-DP never increases policy invocations and release unified implementations applicable across LITS frameworks. The full codebase is publicly available at https://github.com/xinzhel/chain_in_tree.
Introduction
Large language models (LLMs) have transformed the landscape of natural language processing, particularly in tasks that require long-horizon reasoning. Despite their capabilities, traditional inference methods can be computationally taxing and inefficient, especially when applied to tree search methods like LLM Inference via Tree Search (LITS).
Challenges with Current Approaches
While LITS has demonstrated strong performance across various benchmarks, its inefficiencies pose a significant barrier to practical deployment. The need for constant branching at every step of the search process leads to increased computational costs, which in turn affects the overall performance and scalability of LLM applications in real-world scenarios.
Introducing Chain-in-Tree (CiT)
The Chain-in-Tree (CiT) framework addresses these inefficiencies by allowing the model to make informed decisions about when to branch during the search process. This approach not only optimizes computational resources but also focuses on enhancing the quality of reasoning.
Branching Necessity Evaluations
CiT incorporates two primary evaluations for branching necessity:
- BN-DP (Direct Prompting): This method evaluates the necessity of branching based on direct prompts, which has shown to reduce token generation, model calls, and runtime by 75-85% on datasets such as GSM8K and Math500, with little to no loss in accuracy.
- BN-SC (Self-Consistency): This evaluation can yield significant savings of up to 80%. However, it may exhibit instability in certain scenarios, particularly with a small subset of examples that require extensive reasoning steps.
Performance and Results
When integrated into various frameworks, including Tree of Thoughts, ReST-MCTS, and RAP, BN-DP consistently shows a reduction in resource consumption without compromising the quality of outputs. The theoretical backing for BN-DP guarantees that it does not increase policy invocations, thus providing a robust solution for optimizing LLM efficiency.
Availability and Future Work
The complete codebase for Chain-in-Tree is publicly available, encouraging further research and implementation across various LITS frameworks. As LLMs continue to evolve, approaches like CiT pave the way for more efficient reasoning processes in large-scale applications.
