Temporal Knowledge Drift in LLMs: Geometry of Forgetting

The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations

Recent research published on arXiv has unveiled a significant finding regarding large language models (LLMs) and their propensity to produce outdated answers. The study, titled “The Geometry of Forgetting,” highlights that the phenomenon of temporal drift in knowledge representation is not merely an engineering oversight but a fundamental structural issue within these models.

The authors of the paper argue that temporal drift, which refers to the changes in factual information since a model’s training, is geometrically encoded as a direction in the residual stream that is orthogonal to the axes of correctness and uncertainty. As a result, existing methods that focus on these traditional metrics are inherently limited when it comes to detecting drift.

Key Findings

Structural Issue: Temporal drift is identified as a structural problem rather than an engineering flaw.
Geometric Encoding: The study reveals that drift is represented in a geometric direction that is distinct from correctness and uncertainty.
Model Evaluation: The research evaluates six instruction-tuned models, revealing a notable discrepancy in their ability to detect temporal drift.
Probing Accuracy: A linear probe trained on drift labels achieved an area under the receiver operating characteristic curve (AUROC) ranging from 0.83 to 0.95, indicating strong performance in identifying temporal drift.
Limitations of Existing Methods: Traditional methods based on token entropy, semantic entropy, and others scored near chance levels (0.49 to 0.57), demonstrating their ineffectiveness in detecting drift.

Experimental Validation

The authors conducted five distinct tests to confirm the orthogonality of temporal drift from correctness and uncertainty. Key experimental results include:

Weight Cosines: The cosine values of model weights were found to be below 0.14, suggesting minimal correlation with drift.
Score Correlations: Correlation coefficients across various metrics remained low (|r| ≤ 0.20), further supporting the independence of drift.
Null-Space Projections: Both bidirectional and iterative null-space projections indicated negligible differences, reinforcing the idea of geometric independence.
Confabulation Dynamics: The multi-layer perceptron (MLP) retrieval circuit demonstrated similar dynamics for both stale recall and confabulation, with correlation scores exceeding 0.81.

Model-Specific Insights

An interesting aspect of the study involved a cross-cutoff experiment where the input remained unchanged while only the model varied. The results were telling: the probe activated for models trained before a fact’s transition but remained silent for others, with probabilities ranging from 0.975 to 0.998 across twelve model pairs. This finding indicates that the models read their internal knowledge states rather than the properties of the input data.

Future Directions

The implications of this research are profound, as they suggest the need for new methodologies that can effectively identify and address temporal knowledge drift in LLMs. The authors plan to publicly release their code and datasets, paving the way for further exploration and development in this crucial area of artificial intelligence.

As LLMs continue to evolve, understanding and mitigating the impact of temporal knowledge drift will be essential for enhancing their reliability and accuracy in real-world applications.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Temporal Knowledge Drift in LLMs: Geometry of Forgetting

The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations

Key Findings

Experimental Validation

Model-Specific Insights

Future Directions

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related