SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment
Summary: arXiv:2604.13630v1 Announce Type: cross
Abstract
The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use, context management, and state persistence. Yet this same architectural centrality makes the harness a high-value attack surface: a single compromise at the harness level can cascade through the entire execution pipeline. We observe that existing security approaches suffer from structural mismatch, leaving them blind to harness-internal state and unable to coordinate across the different phases of agent operation.
In this paper, we introduce SafeHarness, a security architecture in which four proposed defense layers are woven directly into the agent lifecycle to address the significant limitations mentioned above:
- Adversarial Context Filtering: Implemented at input processing.
- Tiered Causal Verification: Applied during decision making.
- Privilege-Separated Tool Control: Enforced at action execution.
- Safe Rollback with Adaptive Degradation: Executed at state update.
Cross-Layer Mechanisms
The proposed cross-layer mechanisms tie these layers together, escalating verification rigor, triggering rollbacks, and tightening tool privileges whenever sustained anomalies are detected. This integrated approach ensures that security is not an afterthought but a fundamental part of the LLM agent’s operational lifecycle.
Evaluation and Results
We evaluate SafeHarness on benchmark datasets across diverse harness configurations, comparing against four security baselines under five attack scenarios spanning six threat categories. The evaluation metrics include Unsafe Behavior Rate (UBR) and Attack Success Rate (ASR).
Compared to the unprotected baseline, SafeHarness achieves an average reduction of approximately 38% in UBR and 42% in ASR. This significant improvement demonstrates the architecture’s effectiveness in lowering both the unsafe behavior rate and the attack success rate while still preserving core task utility.
Conclusion
In conclusion, SafeHarness represents a substantial advancement in the security of LLM-based agent deployment. By integrating security measures throughout the lifecycle of the agent, it effectively mitigates risks associated with the critical execution harness, ensuring a more robust and reliable operational framework for AI agents.
