SkillSieve: Efficient Detection of Malicious AI Agent Skills

Date:

SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

Summary: arXiv:2604.06550v1 Announce Type: cross

In the fast-evolving landscape of artificial intelligence, security vulnerabilities in AI agent skills pose a significant threat. OpenClaw’s ClawHub marketplace, which hosts over 13,000 community-contributed agent skills, has revealed that between 13% and 26% of these skills contain security vulnerabilities according to recent audits. Traditional methods of detecting these vulnerabilities, such as regex scanners and formal static analyzers, often fall short. Regex scanners can miss obfuscated payloads, while formal analyzers struggle to interpret natural language instructions where prompt injection and social engineering attacks may be concealed.

Introducing SkillSieve

To address these limitations, SkillSieve has been developed as a three-layer detection framework that applies progressively deeper analysis only where necessary. This innovative approach enhances efficiency while improving detection rates for security vulnerabilities in AI agent skills. The framework operates as follows:

  • Layer 1: The initial layer runs regex, Abstract Syntax Tree (AST), and metadata checks through an XGBoost-based feature scorer. This filtering process efficiently eliminates roughly 86% of benign skills in under 40 milliseconds on average, incurring zero API cost.
  • Layer 2: Suspicious skills identified in the first layer are sent for deeper analysis by a Large Language Model (LLM). However, rather than posing a single broad question, Layer 2 divides the analysis into four parallel sub-tasks:
    • Intent Alignment
    • Permission Justification
    • Covert Behavior Detection
    • Cross-file Consistency

    Each sub-task has its own prompt and structured output, allowing for a nuanced examination of potential risks.

  • Layer 3: Skills deemed high-risk are presented before a jury of three different LLMs. These models vote independently on the risk level of the skill. In cases of disagreement, the LLMs engage in a debate to reach a consensus verdict, introducing a collaborative decision-making process.

Evaluation and Performance

The effectiveness of SkillSieve has been evaluated using a dataset of 49,592 real ClawHub skills, alongside adversarial samples across five distinct evasion techniques. The full pipeline was implemented on a 440 ARM single-board computer, showcasing its efficiency and practicality.

On a benchmark of 400 labeled skills, SkillSieve achieved an impressive F1 score of 0.800, significantly outperforming ClawVet, which recorded an F1 score of 0.421. Notably, the average cost per skill analyzed by SkillSieve was only 0.006, emphasizing the framework’s cost-effectiveness.

Open Source Commitment

In line with the principles of transparency and collaboration, the authors have made the code, data, and benchmark for SkillSieve open-sourced. This initiative encourages further research and development in the field of AI security, fostering a community-driven approach to enhancing the safety and reliability of AI agent skills.

SkillSieve represents a significant advancement in the detection of malicious AI agent skills, combining efficiency, accuracy, and innovative methodologies to address a pressing challenge in the AI landscape.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.