Willful Disobedience: Automatically Detecting Failures in Agentic Traces
Summary: arXiv:2603.23806v1 Announce Type: cross
Abstract
As artificial intelligence (AI) agents become increasingly integrated into real-world software systems, they are tasked with executing complex multi-step workflows. These workflows often involve multi-turn dialogues, tool invocations, and various intermediate decisions. However, the long execution histories of these processes, referred to as agentic traces, present significant challenges in validation. Traditional outcome-only benchmarks may overlook critical procedural failures, including:
- Incorrect workflow routing
- Unsafe tool usage
- Violations of prompt-specified rules
To address these challenges, this paper introduces AgentPex, an innovative AI-powered tool developed to systematically evaluate agentic traces. AgentPex extracts behavioral rules directly from the agent prompts and system instructions. It then utilizes these specifications to automatically assess traces for compliance, ensuring that AI agents adhere to expected behaviors and protocols.
Evaluation of AgentPex
In our study, we evaluated AgentPex on a dataset comprising 424 traces from the {\tau}2-bench, which spans several domains, including telecom, retail, and airline customer service. The results of our evaluation demonstrate several key findings:
- AgentPex effectively distinguishes agent behavior across different models.
- It surfaces specification violations that are often missed by outcome-only scoring methods.
- The tool provides a fine-grained analysis by domain and metric.
These findings empower developers to gain a deeper understanding of the strengths and weaknesses of their AI agents at scale. By implementing AgentPex, organizations can better ensure that their AI systems operate within defined parameters and deliver reliable outcomes.
Conclusion
The integration of AI agents into complex software environments necessitates robust validation mechanisms. As demonstrated by our research, AgentPex presents a significant advancement in the field of AI compliance evaluation. By focusing on the underlying behavioral specifications of agentic traces, we can move beyond simplistic outcome assessments and strive for a more comprehensive understanding of AI agent performance.
In summary, AgentPex not only enhances the reliability of AI agents in real-world applications but also lays the groundwork for future innovations in the evaluation of agentic behaviors, ultimately contributing to the safe and effective deployment of AI technologies.
