Spectral Entropy Collapse Signals Delayed Grokking in ML

Date:

Spectral Entropy Collapse as an Empirical Signature of Delayed Generalisation in Grokking

Summary: arXiv:2604.13123v1 Announce Type: cross

Abstract

Grokking, defined as delayed generalisation occurring long after the initial memorisation phase, has remained an enigma in the realm of machine learning. This study identifies the normalised spectral entropy, denoted as $\tilde{H}(t)$, of the representation covariance as a scalar order parameter for the transition into grokking. This finding is validated through experiments conducted on one-layer Transformers applied to group-theoretic tasks.

Key Contributions

  • Two-Phase Pattern: The grokking phenomenon follows a distinct two-phase pattern characterized by a norm expansion followed by an entropy collapse.
  • Stable Threshold: The normalised spectral entropy $\tilde{H}$ consistently crosses a stable threshold of $\tilde{H}^* \approx 0.61$ prior to generalisation in 100% of observed runs, with an average lead time of 1,020 steps.
  • Causal Intervention: Implementing a causal intervention that prevents the entropy collapse results in a delay of grokking by an average of +5,020 steps, with a statistical significance of $p=0.044$. Furthermore, a norm-matched control group ($n=30$, $p=5\times10^{-5}$) corroborates that it is indeed the entropy and not the norm that drives this transition.
  • Power-Law Prediction: A power-law relationship, expressed as $\Delta T = C_1(\tilde{H}-\tilde{H}^*)^\gamma+C_2$ with $R^2=0.543$, successfully predicts the onset of grokking with an error margin of 4.1%.
  • Architecture Matters: The mechanism observed is consistent across both abelian ($\mathbb{Z}/97\mathbb{Z}$) and non-abelian ($S_5$) groups. Notably, multi-layer perceptrons (MLPs) exhibit entropy collapse without the occurrence of grokking, establishing that entropy collapse is necessary, yet not sufficient, highlighting the importance of architectural considerations.

Conclusion

This research provides significant insights into the mechanics behind grokking, proposing that the spectral entropy collapse serves as a vital empirical signature. The implications of these findings may pave the way for further exploration into machine learning architectures and their capacities for delayed generalisation. For those interested in replicating or extending this research, code is available at this link.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.