AgentDrift: Unsafe LLM Recommendations Hidden by Metrics

Date:

AgentDrift: Unsafe Recommendation Drift Under Tool Corruption Hidden by Ranking Metrics in LLM Agents

Summary: arXiv:2603.12564v3 Announce Type: replace-cross

Abstract

Tool-augmented LLM agents increasingly serve as multi-turn advisors in high-stakes domains, yet their evaluation relies on ranking-quality metrics that measure what is recommended but not whether it is safe for the user. We introduce a paired-trajectory protocol that replays real financial dialogues under clean and contaminated tool-output conditions across seven LLMs (7B to frontier) and decomposes divergence into information-channel and memory-channel mechanisms.

Key Findings

Across the seven models tested, we consistently observe the evaluation-blindness pattern: recommendation quality is largely preserved under contamination (utility preservation ratio approximately 1.0) while risk-inappropriate products appear in 65-93% of turns, a systematic safety failure poorly reflected by standard NDCG.

Safety Violations

Safety violations are predominantly information-channel-driven, emerge at the first contaminated turn, and persist without self-correction over 23-step trajectories. Notably, no agent across 1,563 contaminated turns explicitly questions tool-data reliability.

Impact of Narrative-Only Corruption

Even narrative-only corruption, such as biased headlines without numerical manipulation, induces significant drift while completely evading consistency monitors. This raises important concerns regarding the robustness of existing evaluation metrics.

Introducing sNDCG

We propose a safety-penalized NDCG variant (sNDCG) that reduces preservation ratios to 0.51-0.74, indicating that much of the evaluation gap becomes visible once safety is explicitly measured. This suggests that current metrics fail to capture critical safety concerns in multi-turn interactions.

Recommendations

These results motivate considering trajectory-level safety monitoring, beyond single-turn quality, for deployed multi-turn agents in high-stakes settings. The following recommendations are made:

  • Implement safety-penalized evaluation metrics to better assess the risk of recommendations.
  • Encourage developers to integrate real-time safety monitoring systems into LLM agents.
  • Conduct further research on the impact of information-channel and memory-channel mechanisms on recommendation quality.
  • Establish guidelines for evaluating the safety of tool outputs in high-stakes domains.

Conclusion

The findings from our study highlight significant gaps in the safety evaluation of LLM agents, particularly in high-stakes scenarios. By shifting focus towards trajectory-level assessments and implementing enhanced metrics, we can better ensure the reliability and safety of AI-driven recommendations.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.