TPA: Next Token Probability Attribution for Detecting Hallucinations in RAG
Summary: arXiv:2512.07515v4 Announce Type: replace-cross
Abstract
Detecting hallucinations in Retrieval-Augmented Generation (RAG) remains a challenge for researchers and practitioners in the field of artificial intelligence. Traditional methods have attributed hallucinations to a binary conflict between the internal knowledge stored in Feedforward Neural Networks (FFNs) and the retrieved context. However, this perspective is incomplete and overlooks the significant roles played by other components of Large Language Models (LLMs), including the user query, previously generated tokens, the self token, and the final LayerNorm adjustment.
The Proposal: TPA
In light of these shortcomings, we introduce a novel approach called Next Token Probability Attribution (TPA). This methodology aims to comprehensively capture the impact of various components on hallucination detection by mathematically attributing each token’s probability to seven distinct sources:
- Query
- RAG Context
- Past Token
- Self Token
- FFN
- Final LayerNorm
- Initial Embedding
By providing this attribution, TPA quantifies how each source contributes to the generation of the next token in a sequence. This approach not only enhances the understanding of the generative process but also aids in identifying potential hallucinations that may arise from these interactions.
Analyzing Token Contributions
A unique feature of TPA is its ability to aggregate attribution scores by Part-of-Speech (POS) tags. This allows for a more nuanced analysis of how different components of the model contribute to the generation of specific linguistic categories within a response. For instance, anomalies can be detected when nouns disproportionately rely on LayerNorm adjustments, indicating potential areas of hallucination.
Experimental Validation
Extensive experiments conducted to evaluate TPA demonstrate that it achieves state-of-the-art performance in detecting hallucinations within RAG systems. The results indicate that TPA not only improves the reliability of LLM outputs but also provides valuable insights into the underlying mechanisms of token generation.
Conclusion
As artificial intelligence continues to evolve, the challenge of detecting hallucinations in generative models remains a critical area of research. The introduction of TPA marks a significant step forward in this endeavor by offering a comprehensive framework that accounts for multiple sources of influence in token generation. By leveraging TPA, researchers and developers can enhance the robustness and accuracy of RAG systems, ultimately leading to more reliable and trustworthy AI applications.
