SentinelAgent: Intent-Verified Delegation Chains for Securing Federal Multi-Agent AI Systems
Summary: arXiv:2604.02767v1 Announce Type: cross
Abstract: When Agent A delegates to Agent B, which invokes Tool C on behalf of User X, no existing framework can answer: whose authorization chain led to this action, and where did it violate policy? This paper introduces SentinelAgent, a formal framework for verifiable delegation chains in federal multi-agent AI systems.
Introduction to SentinelAgent
In the evolving landscape of artificial intelligence, ensuring secure and verifiable delegation within multi-agent systems is paramount, especially in federal applications. SentinelAgent offers a novel approach by implementing a Delegation Chain Calculus (DCC) that defines critical properties necessary for maintaining authorization integrity.
Key Features of the Delegation Chain Calculus (DCC)
The DCC is built on seven defined properties that ensure the robustness of delegation actions:
- Authority Narrowing: Limits the scope of delegated authority.
- Policy Preservation: Ensures compliance with established policies.
- Forensic Reconstructibility: Allows for tracing actions back to their authorization origins.
- Cascade Containment: Prevents unintended propagation of actions.
- Scope-Action Conformance: Aligns actions with the defined scope of authority.
- Output Schema Conformance: Guarantees that outputs meet specified formats.
- Intent Preservation: A probabilistic property that maintains the original intent of the delegation.
Practical Implications and Performance
The paper outlines four meta-theorems and one proposition that highlight the practical infeasibility of deterministic intent verification. The Intent-Preserving Delegation Protocol (IPDP) is designed to enforce all seven properties in real-time through a robust Delegation Authority Service (DAS).
In extensive testing using DelegationBench v4, which comprises 516 scenarios across 10 attack categories and 13 federal domains, the DAS achieved a remarkable 100% true positive rate (TPR) while maintaining a 0% false positive rate (FPR). Under black-box adversarial conditions, the DAS successfully blocked all 30 attacks with zero false positives.
Resilience Against Adversarial Attacks
The study revealed that deterministic properties remain intact even under rigorous adversarial stress testing. However, intent verification faced challenges, degrading to a TPR of just 13% against sophisticated paraphrasing techniques. Notably, fine-tuning the natural language inference (NLI) model on 190 government delegation examples significantly improved the intent verification performance, boosting TPR from 1.7% to 88.3% with an F1 score of 82.1% through five-fold cross-validation.
Verification and Compliance
Properties P1, P3-P7 have been rigorously verified using TLA+ model checking across 2.7 million states, demonstrating zero violations. Even in scenarios where intent verification is compromised, the remaining six properties effectively constrain adversarial actions, ensuring adherence to permitted API calls, conformant outputs, traceable actions, and compliant behavior.
Conclusion
SentinelAgent represents a significant advancement in the quest for secure and verifiable delegation chains in federal multi-agent AI systems. By enforcing a structured framework with defined properties, it enhances the reliability and accountability of AI delegation processes.
