RAGShield: Provenance-Verified Defense-in-Depth Against Knowledge Base Poisoning in Government Retrieval-Augmented Generation Systems
In recent developments, the increasing reliance on Retrieval-Augmented Generation (RAG) systems by federal agencies for citizen-facing services has raised significant concerns regarding their vulnerability to knowledge base poisoning attacks. These attacks involve adversaries injecting malicious documents into the knowledge base, thereby manipulating the output generated by these systems. A recent study has shown that as few as ten adversarial passages can achieve astonishingly high retrieval success rates, reaching up to 98.2%.
This article introduces RAGShield, a robust five-layer defense-in-depth framework designed to mitigate the risks associated with knowledge base poisoning in RAG systems. The framework draws analogies between RAG knowledge base poisoning and software supply chain attacks, emphasizing the need for a comprehensive approach that integrates supply chain provenance verification into the RAG knowledge pipeline.
Key Features of RAGShield
-
C2PA-inspired Cryptographic Document Attestation:
RAGShield incorporates cryptographic document attestation mechanisms that block unsigned and forged documents during the ingestion process. This ensures that only verified documents are considered in the knowledge base. -
Trust-Weighted Retrieval:
The framework prioritizes provenance-verified sources in its retrieval processes, enhancing the trustworthiness of the information presented to users. -
Formal Taint Lattice:
RAGShield features a formal taint lattice with cross-source contradiction detection, enabling it to catch insider threats even when the provenance of the documents is valid. -
Provenance-Aware Generation:
The system supports provenance-aware generation with auditable citations, allowing users to trace the origin of the information and thus reinforce accountability. -
NIST SP 800-53 Compliance Mapping:
RAGShield maps its framework to the NIST SP 800-53 standards across 15 control families, ensuring compliance with federal regulations and enhancing security protocols.
Evaluation and Results
The effectiveness of RAGShield was evaluated using a 500-passage Natural Questions corpus, which included 63 attack documents and 200 queries against five tiers of adversaries. The evaluation demonstrated a remarkable 0.0% attack success rate, even against adaptive attacks, with a confidence interval of 95% ranging from 0.0% to 1.9%. Additionally, the framework achieved a 0.0% false positive rate, showcasing its precision in distinguishing between legitimate and malicious documents.
However, it is crucial to acknowledge that insider in-place replacement attacks achieved a 17.5% attack success rate, highlighting the inherent limitations of ingestion-time defenses. Furthermore, the cross-source contradiction detector proved effective in identifying subtle numerical manipulation attacks that could bypass provenance verification entirely.
Conclusion
RAGShield represents a significant advancement in the security of RAG systems deployed across government agencies. By integrating supply chain provenance verification and implementing a multi-layered defense strategy, RAGShield addresses the critical vulnerabilities posed by knowledge base poisoning attacks. As the landscape of digital information continues to evolve, frameworks like RAGShield are essential for safeguarding the integrity and reliability of automated systems in public service.
