Efficient Pruning of Long Chain-of-Thought in LRMs

Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization

Summary: arXiv:2508.10164v2 Announce Type: replace

Abstract

Recent advances in Large Reasoning Models (LRMs) have demonstrated strong performance on complex tasks through long Chain-of-Thought (CoT) reasoning. However, their lengthy outputs increase computational costs and may lead to overthinking, raising challenges in balancing reasoning effectiveness and efficiency. Current solutions often compromise reasoning quality or require extensive resources. In this paper, we investigate how to reduce the generation length of LRMs with limited tuning.

Introduction

The rise of Large Reasoning Models (LRMs) has revolutionized many fields by enabling machines to tackle complex tasks that require sophisticated reasoning capabilities. Nonetheless, one of the significant drawbacks of these models is their tendency to produce excessively long outputs during the reasoning process. This not only results in higher computational costs but can also lead to inefficiencies, such as overthinking or irrelevant elaboration.

Problem Statement

As LRMs continue to evolve, researchers face a dual challenge: enhancing the quality of reasoning while simultaneously managing the length of outputs. Existing methods often compromise on one front or require extensive computational resources, making them less feasible in practical applications. Thus, the need for an efficient solution that optimizes output length without sacrificing reasoning quality is more pressing than ever.

Methodology

To address this challenge, we present our approach known as Length Controlled Preference Optimization (LCPO). Our methodology involves the following key steps:

Generation Path Analysis:

We analyze generation path distributions to identify patterns in the outputs of LRMs. This analysis helps in understanding which trajectories lead to longer outputs.
Difficulty Estimation:

We implement a filtering mechanism based on difficulty estimation to streamline the generated trajectories, focusing on those that contribute to effective reasoning without excessive length.
Preference Optimization Objectives:

We explore the convergence characteristics of various preference optimization objectives within a unified Bradley-Terry loss-based framework. This allows us to refine our approach systematically.

Results

Our experiments demonstrate that LCPO significantly reduces the average output length of LRMs by over 50% across multiple benchmarks. Importantly, this reduction in length does not come at the expense of reasoning performance, indicating the effectiveness of our approach.

Conclusion

In conclusion, our work highlights the potential for computationally efficient approaches in guiding LRMs toward effective reasoning without the burden of lengthy outputs. Length Controlled Preference Optimization stands as a novel contribution to the field, offering a viable solution for researchers and practitioners aiming to harness the power of LRMs while maintaining efficiency.

Future Work

Future research may explore the scalability of LCPO and its application to different model architectures and tasks, further enhancing its adaptability in various domains.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Efficient Pruning of Long Chain-of-Thought in LRMs

Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization

Abstract

Introduction

Problem Statement

Methodology

Results

Conclusion

Future Work

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related