A Systematic Security Evaluation of OpenClaw and Its Variants
Summary: arXiv:2604.03131v1 Announce Type: cross
Abstract
Tool-augmented AI agents substantially extend the practical capabilities of large language models, but they also introduce security risks that cannot be identified through model-only evaluation. In this paper, we present a systematic security assessment of six representative OpenClaw-series agent frameworks, namely OpenClaw, AutoClaw, QClaw, KimiClaw, MaxClaw, and ArkClaw, under multiple backbone models.
Introduction
The rapid advancement of AI technologies has led to the emergence of intelligent agents that leverage tool augmentation. While these agents enhance functionality, they also create new security challenges. This paper explores the vulnerabilities inherent in various OpenClaw frameworks, aiming to provide insights into their security posture.
Methodology
To support our evaluation, we constructed a benchmark of 205 test cases covering representative attack behaviors across the full agent execution lifecycle. This approach enables a unified evaluation of risk exposure at both the framework and model levels, facilitating a comprehensive analysis of security vulnerabilities.
Key Findings
Our results show that all evaluated agents exhibit substantial security vulnerabilities. Specifically, we found that:
- Agentized systems are significantly riskier than their underlying models used in isolation.
- Reconnaissance and discovery behaviors emerged as the most common weaknesses across all frameworks.
- Different frameworks expose distinct high-risk profiles, including:
- Credential leakage
- Lateral movement
- Privilege escalation
- Resource development
Implications
These findings indicate that the security of modern agent systems is influenced not only by the safety properties of the backbone model but also by the interplay among model capability, tool use, multi-step planning, and runtime orchestration. The study highlights that once an agent is granted execution capability and persistent runtime context, weaknesses arising in early stages can be amplified into concrete system-level failures.
Conclusion
Overall, our study underscores the necessity of moving beyond prompt-level safeguards toward lifecycle-wide security governance for intelligent agent frameworks. As AI technology continues to evolve, comprehensive security assessments will be essential to mitigate risks associated with tool-augmented agents.
Future Work
Future research will focus on developing more robust security frameworks that can adapt to the evolving nature of AI agents and their operational environments. Additionally, we aim to create tools that aid in the identification and remediation of security vulnerabilities in real-time.
