TPA: Detecting Hallucinations in RAG with Token Attribution

TPA: Next Token Probability Attribution for Detecting Hallucinations in RAG

Summary: arXiv:2512.07515v4 Announce Type: replace-cross

Abstract

Detecting hallucinations in Retrieval-Augmented Generation (RAG) remains a challenge for researchers and practitioners in the field of artificial intelligence. Traditional methods have attributed hallucinations to a binary conflict between the internal knowledge stored in Feedforward Neural Networks (FFNs) and the retrieved context. However, this perspective is incomplete and overlooks the significant roles played by other components of Large Language Models (LLMs), including the user query, previously generated tokens, the self token, and the final LayerNorm adjustment.

The Proposal: TPA

In light of these shortcomings, we introduce a novel approach called Next Token Probability Attribution (TPA). This methodology aims to comprehensively capture the impact of various components on hallucination detection by mathematically attributing each token’s probability to seven distinct sources:

Query
RAG Context
Past Token
Self Token
FFN
Final LayerNorm
Initial Embedding

By providing this attribution, TPA quantifies how each source contributes to the generation of the next token in a sequence. This approach not only enhances the understanding of the generative process but also aids in identifying potential hallucinations that may arise from these interactions.

Analyzing Token Contributions

A unique feature of TPA is its ability to aggregate attribution scores by Part-of-Speech (POS) tags. This allows for a more nuanced analysis of how different components of the model contribute to the generation of specific linguistic categories within a response. For instance, anomalies can be detected when nouns disproportionately rely on LayerNorm adjustments, indicating potential areas of hallucination.

Experimental Validation

Extensive experiments conducted to evaluate TPA demonstrate that it achieves state-of-the-art performance in detecting hallucinations within RAG systems. The results indicate that TPA not only improves the reliability of LLM outputs but also provides valuable insights into the underlying mechanisms of token generation.

Conclusion

As artificial intelligence continues to evolve, the challenge of detecting hallucinations in generative models remains a critical area of research. The introduction of TPA marks a significant step forward in this endeavor by offering a comprehensive framework that accounts for multiple sources of influence in token generation. By leveraging TPA, researchers and developers can enhance the robustness and accuracy of RAG systems, ultimately leading to more reliable and trustworthy AI applications.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

TPA: Detecting Hallucinations in RAG with Token Attribution

TPA: Next Token Probability Attribution for Detecting Hallucinations in RAG

Abstract

The Proposal: TPA

Analyzing Token Contributions

Experimental Validation

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related