Towards Reliable Truth-Aligned Uncertainty Estimation in Large Language Models
Summary: arXiv:2604.00445v1 Announce Type: new
Abstract: Uncertainty estimation (UE) aims to detect hallucinated outputs of large language models (LLMs) to improve their reliability. However, UE metrics often exhibit unstable performance across configurations, which significantly limits their applicability. In this work, we formalise this phenomenon as proxy failure, since most UE metrics originate from model behaviour, rather than being explicitly grounded in the factual correctness of LLM outputs.
With this, we show that UE metrics become non-discriminative precisely in low-information regimes. To alleviate this, we propose Truth AnChoring (TAC), a post-hoc calibration method to remedy UE metrics, by mapping the raw scores to truth-aligned scores. Even with noisy and few-shot supervision, our TAC can support the learning of well-calibrated uncertainty estimates, and presents a practical calibration protocol.
Our findings highlight the limitations of treating heuristic UE metrics as direct indicators of truth uncertainty, and position our TAC as a necessary step toward more reliable uncertainty estimation for LLMs.
Introduction
As large language models (LLMs) continue to advance, their applications in various fields are becoming increasingly prevalent. However, one significant challenge remains: ensuring the reliability of these models by accurately estimating their uncertainty. Uncertainty estimation (UE) is crucial for identifying when a model may produce unreliable or “hallucinated” outputs. Despite its importance, existing UE metrics often struggle with consistency, leading to a phenomenon we term “proxy failure.”
Understanding Proxy Failure
Proxy failure occurs when UE metrics, which are designed to assess model performance, do not effectively correlate with the factual correctness of the outputs. This issue is particularly pronounced in low-information regimes, where the model’s output may lack sufficient context or data for accurate evaluation.
Introducing Truth AnChoring (TAC)
To address the limitations of traditional UE metrics, we propose a novel approach called Truth AnChoring (TAC). This post-hoc calibration method aims to align raw uncertainty scores with factual accuracy, thereby enhancing the reliability of uncertainty estimates. The main features of TAC include:
- Mapping Raw Scores: TAC transforms raw UE scores into truth-aligned scores, promoting a more accurate reflection of uncertainty.
- Noisy and Few-Shot Supervision: The method is designed to function effectively even in scenarios with limited data, demonstrating its robustness.
- Practical Calibration Protocol: TAC offers a straightforward calibration process, making it accessible for implementation in various applications.
Conclusion
Our research highlights the critical need for improved uncertainty estimation methods in LLMs. By recognizing the limitations of existing heuristic UE metrics and introducing Truth AnChoring, we pave the way for more reliable evaluations of model outputs. The development of TAC represents a significant advancement in the quest for truth-aligned uncertainty estimation, ultimately enhancing the reliability and trustworthiness of large language models.
The code repository for implementing Truth AnChoring is available at https://github.com/ponhvoan/TruthAnchor/.
