Context Contamination in LLM Pipelines: Why Retrying Fails

Why Retrying Fails: Context Contamination in LLM Agent Pipelines

Summary: arXiv:2605.08563v1 Announce Type: new

Abstract: In the domain of Large Language Models (LLMs), the phenomenon of context contamination has emerged as a significant barrier to achieving reliable outcomes in multi-step tool-augmented tasks. This article explores the implications of context contamination when an LLM agent fails and subsequently retries a task, leading to elevated per-step error rates. Our research introduces the Context-Contaminated Restart Model (CCRM), a theoretical framework designed to quantify and analyze this issue.

Understanding Context Contamination

Context contamination occurs when an LLM agent retains information from a failed attempt during its next try. This retained context can mislead the model and elevate the chances of subsequent failures. The CCRM provides a detailed examination of this phenomenon through a series of rigorous analyses.

Key Results of the Context-Contaminated Restart Model

Our research yields five pivotal results:

Result 1 (R1): We present an exact closed-form formula for the probability of succeeding within a maximum of K attempts, incorporating the effects of context contamination.
Result 2 (R2): A cascade-overhead theorem quantifying the additional attempts, ΔK, required due to contamination, compared to a clean-restart baseline.
Result 3 (R3): An optimal budget-allocation theorem which identifies the pipeline depth T* that maximizes success probability for a fixed total budget B=KT. We derive the closed form T* = sqrt(B * log(1/(1-epsilon_1)) / log(1/(1-epsilon_0))), where K*=B/T*.
Result 4 (R4): An information-theoretic lower bound established via Le Cam’s method, demonstrating that K_CCRM remains tight up to O(1).
Result 5 (R5): A clean-restart dominance theorem that quantifies the benefits of clearing context before a retry attempt.

Empirical Validation of CCRM

To validate our model, we applied CCRM to real data sourced from the SWE-bench Verified dataset. Our findings indicate that the IID model significantly overestimates the pass rate at three attempts, projecting an inflated success rate of 98.6%, while our model fits the actual performance with an error margin of less than 0.001. This discrepancy implies a cascade ratio of epsilon_1/epsilon_0 = 7.1, suggesting that context contamination has a profound impact on performance outcomes.

Monte Carlo Experiments

We conducted a series of Monte Carlo simulations to further corroborate our theoretical predictions. These experiments consistently demonstrated that the effects of context contamination, as modeled by CCRM, align closely with observed behaviors in LLM pipelines. The results illustrate not only the theoretical soundness of our model but also its practical implications for improving the efficiency and effectiveness of LLM agents in real-world applications.

Conclusion

As LLM technologies continue to evolve and permeate various sectors, understanding and addressing context contamination becomes increasingly crucial. The Context-Contaminated Restart Model provides a foundational framework for recognizing the limitations of current retry mechanisms and offers pathways for optimizing LLM agent performance. Future work should focus on refining these models and developing strategies to mitigate the adverse effects of context contamination.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Context Contamination in LLM Pipelines: Why Retrying Fails

Why Retrying Fails: Context Contamination in LLM Agent Pipelines

Understanding Context Contamination

Key Results of the Context-Contaminated Restart Model

Empirical Validation of CCRM

Monte Carlo Experiments

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related