Local Scoring with LALP for Better AI Reasoning Data

Date:

The Signal is in the Steps: Local Scoring for Reasoning Data Selection

Summary: arXiv:2510.03988v2 Announce Type: replace-cross

Abstract: Distilling long-form reasoning from teacher models into smaller students requires selecting which candidate solutions to train on. Recent work argues that one should select responses the student model assigns highest probability, i.e., favoring solutions “natural” to the student. However, we find that this approach works within a single teacher but fails when scaling to long reasoning traces from multiple diverse teachers. We identify a key cause: this approach scores entire solutions, but students generalize by recombining familiar reasoning steps, not by memorizing complete solutions. Full-trajectory scoring optimizes the wrong target; it rewards global fluency while the transferable signal lies in local step transitions. We propose Local Average Log Probability (LALP), which scores each reasoning step using only a small window of preceding context, measuring whether each step is justified by its immediate premises rather than whether the full response looks natural to the student. LALP enables two practical use cases: selecting the best teacher before fine-tuning and curating training data from diverse teacher pools. Across math, coding, and science reasoning tasks, LALP consistently improves accuracy when selecting the most natural solutions by a large margin.

Introduction

The process of training smaller models to emulate the reasoning capabilities of larger teacher models poses unique challenges. As the demand for efficient AI systems grows, the need for effective data selection methods has become increasingly important. Traditional methods have focused on selecting the most probable responses according to the student model; however, this approach often leads to suboptimal outcomes when scaling across diverse teaching methodologies.

Challenges with Current Methods

One of the primary challenges identified in existing methodologies is the reliance on full-trajectory scoring. This method evaluates entire solutions rather than breaking down the reasoning process into manageable components. As a result, it tends to overlook the nuanced transitions between reasoning steps that enable a student model to generalize effectively. Instead of memorizing solutions, students learn by integrating familiar steps from various contexts.

Introducing Local Average Log Probability (LALP)

To address these shortcomings, the authors propose a novel scoring method: Local Average Log Probability (LALP). This approach emphasizes the importance of local context by scoring each reasoning step based on a limited window of preceding information. By measuring the justification of each step against its immediate premises, LALP shifts the focus from global coherence to local accuracy.

Practical Applications of LALP

LALP introduces two significant use cases in the realm of AI training:

  • Selecting the Best Teacher: Before fine-tuning a student model, LALP can help identify the teacher model that provides the most appropriate reasoning steps, thereby enhancing the overall learning process.
  • Curating Training Data: LALP allows for the efficient selection of training data from a pool of diverse teacher models, ensuring that the student model receives the most relevant and beneficial examples.

Results and Conclusion

In empirical evaluations across various reasoning tasks—including math, coding, and science—LALP demonstrated a marked improvement in accuracy when compared to traditional scoring methods. This advancement underscores the importance of focusing on local reasoning transitions rather than merely the fluency of complete responses. The findings encourage a shift in how AI training data is selected, paving the way for more robust and adaptable AI systems.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.