Entropy Centroids for Efficient Test-Time Scaling in LLMs

Entropy Centroids as Intrinsic Rewards for Test-Time Scaling

Recent advancements in large language models (LLMs) have highlighted the need for efficient test-time scaling methods. A promising approach involves sampling multiple responses and selecting the optimal one, akin to methodologies employed by Grok Heavy and Gemini Deep Think. However, traditional selection techniques frequently depend on external reward models, necessitating the training of robust models and introducing additional computational overhead.

In a new paper titled “Entropy Centroids as Intrinsic Rewards for Test-Time Scaling” (arXiv:2604.26173v1), researchers propose an innovative method that leverages intrinsic signals, specifically focusing on model uncertainty as a means of enhancing response quality without the need for external rewards.

Understanding Intrinsic Signals

Prior methods have investigated intrinsic signals like confidence levels and entropy. However, these signals can often be unreliable when aggregated naively. The authors of this study have made a significant observation: during inference, high-entropy tokens tend to cluster in consecutive groups, offering a more stable indication of model uncertainty than evaluating individual tokens. This clustering reveals temporal patterns of uncertainty, which can be effectively utilized to inform response selection.

Introducing High Entropy Phases (HEPs)

The researchers introduce the concept of High Entropy Phases (HEPs)—defined as variable-length segments that begin with a high-entropy token and conclude when a sequence of low-entropy tokens appears. This formalization provides a foundational unit for measuring segment-level uncertainty. By analyzing these segments, the study aims to define intrinsic rewards based on the temporal structure of uncertainty.

Defining the Entropy Centroid

Building on the concept of HEPs, the study introduces the Entropy Centroid, inspired by the physics principle of the center of mass. The Entropy Centroid represents the weighted average position of all HEPs along the inference trajectory. A crucial insight from the research is that a lower centroid typically signifies early exploration followed by confident generation, often correlating with higher response quality.

The Lowest Centroid Method

Based on their findings, the researchers propose the Lowest Centroid method, which selects responses exhibiting the lowest entropy centroid from a pool of candidates. This method offers a novel way to leverage intrinsic rewards derived from model uncertainty, minimizing reliance on external models.

Experimental Results

The authors conducted extensive experiments across various tasks including mathematics, code generation, logical reasoning, and agentic tasks, utilizing models ranging from 14 billion to 480 billion parameters. The results demonstrated that the Lowest Centroid method consistently outperformed existing baseline approaches, yielding stable improvements in response quality as the model size increased.

Conclusion and Future Directions

This innovative approach not only enhances the efficiency of large language models during test-time scaling but also opens avenues for further exploration in the realm of intrinsic rewards. By harnessing the temporal structure of model uncertainty, researchers can potentially refine the capabilities of LLMs, paving the way for more robust applications across diverse domains.

For those interested in exploring the code associated with this research, it is publicly available at https://github.com/hkust-nlp/entropy-centroid.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Entropy Centroids for Efficient Test-Time Scaling in LLMs

Entropy Centroids as Intrinsic Rewards for Test-Time Scaling

Understanding Intrinsic Signals

Introducing High Entropy Phases (HEPs)

Defining the Entropy Centroid

The Lowest Centroid Method

Experimental Results

Conclusion and Future Directions

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related