Measuring LLM Trust Allocation Across Conflicting Software Artifacts
Summary: arXiv:2604.03447v1 Announce Type: cross
Abstract: LLM-based software engineering assistants fail not only by producing incorrect outputs, but also by allocating trust to the wrong artifact when code, documentation, and tests disagree. Existing evaluations focus mainly on downstream outcomes and therefore cannot reveal whether a model recognized degraded evidence, identified the unreliable source, or calibrated its trust across artifacts.
We present TRACE (Trust Reasoning over Artifacts for Calibrated Evaluation), a framework that elicits structured artifact-level trust traces over Javadoc, method signatures, implementations, and test prefixes under blind perturbations. Using 22,339 valid traces from seven models on 456 curated Java method bundles, we evaluate per-artifact quality assessment, inconsistency detection, affected artifact attribution, and source prioritization.
Key Findings
- Quality penalties are primarily localized to the perturbed artifact, with severity affecting the degree of trust allocation.
- Sensitivity to errors is not uniform across artifact types; documentation bugs create a more significant gap in trust allocation compared to implementation faults, with metrics ranging from 0.152-0.253 for documentation versus 0.049-0.123 for implementation.
- Models exhibit strong performance in detecting explicit documentation bugs, achieving detection rates between 67-94%, and identifying Javadoc and implementation contradictions at rates of 50-91%.
- However, models exhibit a notable blind spot when faced with situations where the implementation drifts while the documentation appears plausible, leading to a 7-42 percentage point drop in detection accuracy.
- Confidence levels remain poorly calibrated for six out of the seven models tested, indicating a significant area for improvement.
Implications for Software Engineering
The findings suggest that current large language models (LLMs) demonstrate a stronger capability in auditing natural-language specifications than in identifying subtle discrepancies at the code level. This raises critical questions regarding the reliability of LLMs in environments where correctness is paramount.
Prioritizing artifact-level trust reasoning is essential before deploying these models in correctness-critical applications. The TRACE framework not only offers a method for evaluating trust allocation but also provides insights into how models can better navigate the complexities of conflicting software artifacts.
Conclusion
As LLMs continue to be integrated into software engineering practices, understanding their limitations in trust allocation will be crucial. The TRACE framework serves as a valuable tool for assessing and calibrating trust across different software artifacts, ultimately guiding improvements in model training and evaluation.
By addressing these challenges, researchers and practitioners can work towards enhancing the reliability of LLMs, ensuring they are equipped to handle the nuanced realities of software development.
