Entropy Trend Reward Boosts Efficient Chain-of-Thought AI

ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning

Summary: arXiv:2604.05355v1 Announce Type: new

Abstract: Chain-of-thought (CoT) reasoning improves large language model performance on complex tasks, but often produces excessively long and inefficient reasoning traces. Existing methods shorten CoTs using length penalties or global entropy reduction, implicitly assuming that low uncertainty is desirable throughout reasoning. We show instead that reasoning efficiency is governed by the trajectory of uncertainty. CoTs with dominant downward entropy trends are substantially shorter. Motivated by this insight, we propose Entropy Trend Reward (ETR), a trajectory-aware objective that encourages progressive uncertainty reduction while allowing limited local exploration. We integrate ETR into Group Relative Policy Optimization (GRPO) and evaluate it across multiple reasoning models and challenging benchmarks. ETR consistently achieves a superior accuracy-efficiency tradeoff, improving DeepSeek-R1-Distill-7B by 9.9% in accuracy while reducing CoT length by 67% across four benchmarks. Code is available at https://github.com/Xuan1030/ETR.

Introduction

The advent of large language models has revolutionized the field of artificial intelligence, particularly in tasks requiring complex reasoning. Chain-of-thought reasoning has emerged as a powerful technique, enhancing the models’ ability to tackle intricate problems. However, one significant challenge remains: the generation of excessively long and inefficient reasoning traces.

Challenges with Current Methods

Existing approaches to mitigate the length of chain-of-thought reasoning often utilize methods such as:

Length penalties
Global entropy reduction

These methods implicitly operate under the assumption that minimizing uncertainty will lead to better reasoning outcomes. However, this perspective overlooks a critical aspect of reasoning efficiency: the trajectory of uncertainty.

Introducing Entropy Trend Reward (ETR)

Research indicates that chain-of-thought patterns characterized by dominant downward entropy trends result in significantly shorter reasoning paths. To capitalize on this insight, we introduce the Entropy Trend Reward (ETR), a new trajectory-aware objective designed to:

Encourage progressive uncertainty reduction
Allow for limited local exploration

By focusing on the trajectory of uncertainty rather than merely its overall level, ETR aims to optimize reasoning efficiency in a more nuanced manner.

Integration with Group Relative Policy Optimization (GRPO)

ETR has been seamlessly integrated into the Group Relative Policy Optimization (GRPO) framework. This combination has been evaluated across various reasoning models and rigorous benchmarks, demonstrating substantial improvements in performance.

Results and Achievements

The results of integrating ETR into reasoning models have been promising:

Improved accuracy of DeepSeek-R1-Distill-7B by 9.9%
Reduced the length of chain-of-thought reasoning by 67% across four different benchmarks

These findings highlight the potential of ETR to enhance the efficiency of reasoning processes in large language models, paving the way for more effective AI applications.

Conclusion

With the introduction of the Entropy Trend Reward, researchers and developers have a new tool at their disposal to optimize chain-of-thought reasoning in large language models. By prioritizing the trajectory of uncertainty, ETR represents a significant advancement in achieving a favorable accuracy-efficiency balance in AI reasoning.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Entropy Trend Reward Boosts Efficient Chain-of-Thought AI

ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning

Introduction

Challenges with Current Methods

Introducing Entropy Trend Reward (ETR)

Integration with Group Relative Policy Optimization (GRPO)

Results and Achievements

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related