Bi-Predictability: A Real-Time Signal for Monitoring LLM Interaction Integrity
Summary: arXiv:2604.13061v1 Announce Type: cross
Abstract
Large language models (LLMs) are increasingly deployed in high-stakes autonomous and interactive workflows, where reliability demands continuous, multi-turn coherence. However, current evaluation methods either rely on post-hoc semantic judges, measure unidirectional token confidence (e.g., perplexity), or require compute-intensive repeated sampling (e.g., semantic entropy). Because these techniques focus exclusively on the model’s output distribution, they cannot monitor whether the underlying interaction remains structurally coupled in real time, leaving systems vulnerable to gradual, undetected degradation.
Introduction
In the evolving landscape of artificial intelligence, particularly with the use of LLMs, the integrity of interactions is paramount. These models are often engaged in complex dialogues where the stakes are high, necessitating a robust mechanism to ensure that the conversation remains coherent and relevant throughout its duration. The traditional methods of evaluation fall short, as they do not provide real-time insight into the model’s performance.
Introducing Bi-Predictability
This article presents a novel approach to monitoring interaction integrity through a metric known as bi-predictability (P). This measure is derived from fundamental information theory and can be computed directly from raw token frequency statistics, allowing for real-time assessment of multi-turn interactions.
Information Digital Twin (IDT)
The IDT is a lightweight architecture designed to estimate bi-predictability across the context-response-next prompt loop without relying on secondary inference or embeddings. This innovation opens up new possibilities for effectively monitoring the integrity of LLM interactions.
Key Findings
- The IDT was tested across 4,500 conversational turns between a student model and three frontier teacher models.
- It successfully detected injected disruptions with 100% sensitivity, showcasing its effectiveness in real-time monitoring.
- Interestingly, the study revealed a significant divergence between structural coupling and semantic quality. Bi-predictability aligned with structural consistency in 85% of conditions, yet it only matched semantic judge scores in 44% of cases.
Silent Uncoupling
These findings expose a critical phenomenon known as “silent uncoupling,” where LLMs can generate high-scoring outputs while the conversational context deteriorates. This disconnect highlights the necessity for robust structural monitoring independent of semantic evaluation.
Conclusion
By decoupling structural monitoring from semantic assessment, the IDT presents a scalable and computationally efficient solution for real-time AI assurance and closed-loop regulation. This advancement not only enhances the reliability of LLMs in high-stakes scenarios but also paves the way for future developments in AI interaction integrity.
Future Work
As the technology evolves, further research will be essential to refine these methods and explore their applications across various domains. The potential for improved reliability in AI systems through bi-predictability monitoring represents a significant leap forward in the field.
