Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps
Summary: arXiv:2604.19565v1 Announce Type: cross
Abstract
Hallucinations in Speech Large Language Models (SpeechLLMs) pose significant risks, yet existing detection methods typically rely on gold-standard outputs that are costly or impractical to obtain. Moreover, hallucination detection methods developed for text-based LLMs do not directly capture audio-specific signals.
Introduction
As advancements in artificial intelligence continue to evolve, Speech Large Language Models (SpeechLLMs) have gained prominence in various applications, including automatic speech recognition (ASR) and speech-to-text translation. However, these models are not without their challenges, particularly in the form of hallucinations—instances where the model generates outputs that are nonsensical or not grounded in reality. The detection of such hallucinations is critical to ensuring the reliability and safety of these systems.
Challenges in Hallucination Detection
Current methodologies for detecting hallucinations in SpeechLLMs often rely on gold-standard outputs, which can be difficult and expensive to obtain. This poses a significant limitation as these methods may not be feasible for real-world applications. Additionally, most existing detection techniques have been developed for text-based models, failing to address the unique characteristics of audio data that may signal hallucinations.
Proposed Metrics for Detection
In our study, we explore four attention-derived metrics specifically designed to identify pathological attention patterns that may indicate hallucination:
- AUDIORATIO: Measures the ratio of audio features that correlate with hallucinations.
- AUDIOCONSISTENCY: Assesses the consistency of audio outputs against expected patterns.
- AUDIOENTROPY: Evaluates the unpredictability of audio outputs.
- TEXTENTROPY: Analyzes the text output’s complexity and coherence.
These metrics serve as features for training lightweight logistic regression classifiers aimed at enabling efficient inference-time detection of hallucinations.
Results and Evaluation
We conducted extensive evaluations on popular datasets, including Qwen-2-Audio and Voxtral-3B, across various tasks such as ASR and speech-to-text translation. Our findings indicate that the proposed approach outperforms traditional uncertainty-based and prior attention-based baselines, achieving improvements of up to +0.23 PR-AUC on in-domain data. Furthermore, the model demonstrates generalization capabilities to out-of-domain ASR settings, highlighting its robustness.
Attention Head Analysis
Interestingly, our analysis reveals that strong performance can be achieved with approximately 100 attention heads, which enhances generalization compared to utilizing all available heads. This insight is crucial for optimizing model architecture and performance in practical applications.
Conclusion
While the effectiveness of our detection method is model-dependent and requires specific training for different tasks, our results illustrate that attention patterns are a valuable tool for detecting hallucinations in SpeechLLMs. As the field progresses, these findings could pave the way for more reliable and safe deployment of SpeechLLMs in real-world scenarios.
