CodeTracer: Towards Traceable Agent States
Summary: arXiv:2604.11641v3 Announce Type: replace-cross
Introduction
As the field of artificial intelligence advances, the development of code agents is progressing at a remarkable pace. However, this rapid advancement brings forth significant challenges in debugging these agents. With complex frameworks orchestrating parallel tool calls and multi-stage workflows, the observation of state transitions and error propagation within these agents is becoming increasingly difficult.
The Challenge of Debugging Code Agents
In the execution of code agents, a single misstep can lead to unproductive loops or, worse, cascade into fundamental errors. These hidden error chains complicate the debugging process, making it challenging to track when and why an agent goes off track. Existing methods for tracing agents either focus on simple interactions or rely heavily on small-scale manual inspections. This limitation significantly undermines their scalability and effectiveness in real coding workflows.
Introducing CodeTracer
To address these pressing issues, we present CodeTracer, a novel tracing architecture designed to enhance the observability of code agent states. CodeTracer operates by parsing heterogeneous run artifacts through a set of evolving extractors. This process allows it to reconstruct the full state transition history of an agent as a hierarchical trace tree, complete with persistent memory capabilities.
Key Features of CodeTracer
- Hierarchical Trace Tree: CodeTracer reconstructs the transition history of the agent, providing a clear visual representation of state changes.
- Persistent Memory: The architecture retains crucial information over time, enabling deeper insights into the agent’s performance.
- Failure Onset Localization: CodeTracer can efficiently pinpoint the origin of failures and the subsequent error chain, facilitating quicker debugging processes.
Systematic Evaluation with CodeTraceBench
To enable systematic evaluation of CodeTracer, we have constructed CodeTraceBench. This benchmark is derived from a large collection of executed trajectories generated by four widely used code agent frameworks, applied to various coding tasks such as bug fixing, refactoring, and terminal interaction. CodeTraceBench includes supervision at both the stage and step levels for failure localization, ensuring comprehensive testing and validation.
Results and Performance
Experimental results demonstrate that CodeTracer significantly outperforms both direct prompting techniques and lightweight baselines. Notably, the architecture’s ability to replay diagnostic signals consistently recovers originally failed runs when matched against specified budgets. This performance showcases CodeTracer’s potential to streamline debugging processes for code agents effectively.
Availability
For those interested in exploring CodeTracer further, we are pleased to announce that both the source code and the dataset used for testing are publicly available. This transparency aims to foster further research and development in the field of code agent debugging.
Conclusion
As the complexity of code agents continues to grow, so does the need for robust debugging tools. CodeTracer represents a significant step forward in enabling developers to trace agent states effectively, ensuring smoother workflows and more reliable outcomes in AI-driven coding tasks.
