Measuring Intrinsic Non-Randomness in Language Models

The Randomness Floor: Measuring Intrinsic Non-Randomness in Language Model Token Distributions

Recent advancements in language models have ignited a debate about the inherent randomness in their token distributions. A groundbreaking paper titled “The Randomness Floor: Measuring Intrinsic Non-Randomness in Language Model Token Distributions” (arXiv:2604.22771v1) introduces a novel metric known as Entropic Deviation (ED) to systematically assess this non-randomness across a variety of models and configurations.

Understanding Entropic Deviation (ED)

Entropic Deviation is defined as the normalized Kullback-Leibler divergence between a language model’s token distribution and a uniform distribution. This metric offers a quantitative measure of how much a model’s output deviates from randomness, providing insights into the intrinsic characteristics of the model’s learned weights.

Scope of the Study: The research encompasses a comprehensive analysis of 31,200 generations from seven different models across various parameters.
Architectures Examined: The study includes two primary architectures: transformer models and state space models.
Prompt Categories: Nine different semantic prompt categories were utilized, including semantically neutral prompts such as empty strings and random characters.
Temperature Settings: The experiments were conducted using three different temperature settings, affecting the models’ output variability.
Language Diversity: The analysis spans five languages, allowing for cross-lingual comparisons.

Key Findings

The findings of the study reveal several intriguing insights into the non-randomness present in language models:

Intrinsic Non-Randomness: Even under semantically neutral prompts, transformer models exhibit an ED of approximately 0.30. This indicates that 88-93% of the observed non-randomness is intrinsic to the models’ learned weights rather than being influenced by the context of the prompts.
Consistency Among Transformers: Three transformer families—Gemma, Llama, and Qwen—demonstrated nearly identical ED values despite variations in their training data and vocabularies.
Contrasts with State Space Models: The state space model Mamba2 displayed a qualitatively different behavior, showing twice the ED, three times lower within-sequence variance, and a pronounced sensitivity to temperature settings, in stark contrast to the relative immunity of transformers.
Cross-Lingual Stability: Experiments with the Qwen-32B model revealed a stable gradient of ED across five languages (English, Japanese, Chinese, Polish, Arabic), which did not correlate with token fertility and persisted even when comparing languages sharing identical tokeniser subsets.

Implications for Future Research

This research establishes a structural lower bound on randomness in pretrained language models, characterizing how this bound varies across different architectures. The demonstration that language itself can modulate this bound independently of tokenization opens new avenues for exploration in model design and evaluation. Understanding these intrinsic properties may significantly impact the development of more robust and interpretable language models in the future.

As the field of AI continues to evolve, the insights gained from such studies will be pivotal in refining the capabilities of language models and enhancing their utility across various applications.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Measuring Intrinsic Non-Randomness in Language Models

The Randomness Floor: Measuring Intrinsic Non-Randomness in Language Model Token Distributions

Understanding Entropic Deviation (ED)

Key Findings

Implications for Future Research

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related