Small-Scale Disposition Distillation: Key Negative Findings

Date:

Disposition Distillation at Small Scale: A Three-Arc Negative Result

Summary: arXiv:2604.11867v1 Announce Type: cross

Introduction

In the rapidly evolving field of artificial intelligence, the distillation of behavioral dispositions into small language models presents both opportunities and challenges. Recent research aimed to explore this potential through a comprehensive four-stage distillation pipeline developed at MIT. The study focused on training models with 0.6B to 2.3B effective parameters to enhance their capabilities in self-verification, uncertainty acknowledgment, and feedback integration.

Methodology

The research utilized a four-stage distillation pipeline, which included:

  • Training behavioral dispositions into small language models.
  • Conducting follow-on experiments on inference-time attention-head interventions.
  • Implementing a frozen-base confidence-gated sidecar.

An internal draft initially reported significant performance gains: a +33.9-point increase in the Massachusetts Comprehensive Assessment System (MCAS) and a +15.3-point improvement in HumanEval scores for the Qwen3-0.6B model. However, these results were later found to be misleading.

Findings and Results

Subsequent sanity checks revealed that the reported gains were artifacts of the experimental setup:

  • The HumanEval delta was identified as a truncation artifact, reversing to a decline of -8.0 points when the prediction count was adjusted from 512 to 1024.
  • The MCAS gain vanished under rigorous apples-to-apples scoring conditions.

These falsifications led to three additional arcs of investigation:

  • Applying SFT/DPO LoRA across three model families and two domains.
  • Experimenting with inference-time attention-head tempering on the output projection layer (o_proj).
  • Deploying a training-free frozen-base sidecar that analyzed the final-token hidden state (h_last).

Despite these extensive efforts, no operator was able to enhance judge-measured disposition without negatively impacting the content quality or resulting in stylistic mimicry across five tested models: Qwen3-0.6B, Qwen3-1.7B, Qwen3.5-0.8B, Gemma 4 E2B, and SmolLM2-1.7B-Instruct.

Conclusion

This study contributes a three-arc negative result with detailed mechanisms and introduces a two-failure-mode taxonomy for linear h_last probes. Moreover, it establishes an honest falsification pipeline that transforms previously generated false positives into publishable negatives. An independent observation noted that the Gemma 4 E2B model displayed a near-complete decoupling of confidence and correctness in the Chef domain, asserting at 91% regardless of the actual correctness, which raises intriguing questions about model reliability in various contexts.

This research emphasizes the complexity of distilling behavioral dispositions in AI models and the necessity for rigorous validation of results before publication.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.