Efficient Pruning of Long Chain-of-Thought in LRMs

Date:

Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization

Summary: arXiv:2508.10164v2 Announce Type: replace

Abstract

Recent advances in Large Reasoning Models (LRMs) have demonstrated strong performance on complex tasks through long Chain-of-Thought (CoT) reasoning. However, their lengthy outputs increase computational costs and may lead to overthinking, raising challenges in balancing reasoning effectiveness and efficiency. Current solutions often compromise reasoning quality or require extensive resources. In this paper, we investigate how to reduce the generation length of LRMs with limited tuning.

Introduction

The rise of Large Reasoning Models (LRMs) has revolutionized many fields by enabling machines to tackle complex tasks that require sophisticated reasoning capabilities. Nonetheless, one of the significant drawbacks of these models is their tendency to produce excessively long outputs during the reasoning process. This not only results in higher computational costs but can also lead to inefficiencies, such as overthinking or irrelevant elaboration.

Problem Statement

As LRMs continue to evolve, researchers face a dual challenge: enhancing the quality of reasoning while simultaneously managing the length of outputs. Existing methods often compromise on one front or require extensive computational resources, making them less feasible in practical applications. Thus, the need for an efficient solution that optimizes output length without sacrificing reasoning quality is more pressing than ever.

Methodology

To address this challenge, we present our approach known as Length Controlled Preference Optimization (LCPO). Our methodology involves the following key steps:

  • Generation Path Analysis:

    We analyze generation path distributions to identify patterns in the outputs of LRMs. This analysis helps in understanding which trajectories lead to longer outputs.

  • Difficulty Estimation:

    We implement a filtering mechanism based on difficulty estimation to streamline the generated trajectories, focusing on those that contribute to effective reasoning without excessive length.

  • Preference Optimization Objectives:

    We explore the convergence characteristics of various preference optimization objectives within a unified Bradley-Terry loss-based framework. This allows us to refine our approach systematically.

Results

Our experiments demonstrate that LCPO significantly reduces the average output length of LRMs by over 50% across multiple benchmarks. Importantly, this reduction in length does not come at the expense of reasoning performance, indicating the effectiveness of our approach.

Conclusion

In conclusion, our work highlights the potential for computationally efficient approaches in guiding LRMs toward effective reasoning without the burden of lengthy outputs. Length Controlled Preference Optimization stands as a novel contribution to the field, offering a viable solution for researchers and practitioners aiming to harness the power of LRMs while maintaining efficiency.

Future Work

Future research may explore the scalability of LCPO and its application to different model architectures and tasks, further enhancing its adaptability in various domains.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.