Formal Architecture Descriptors as Navigation Primitives for AI Coding Agents
Summary: arXiv:2604.13108v1 Announce Type: cross
Abstract: AI coding agents spend a substantial fraction of their tool calls on undirected codebase exploration. We investigate whether providing agents with formal architecture descriptors can reduce this navigational overhead. We present three complementary studies. First, a controlled experiment (24 code localization tasks x 4 conditions, Claude Sonnet 4.6, temperature=0) demonstrates that architecture context reduces navigation steps by 33-44% (Wilcoxon p=0.009, Cohen’s d=0.92), with no significant format difference detected across S-expression, JSON, YAML, and Markdown. Second, an artifact-vs-process experiment (15 tasks x 3 conditions) demonstrates that an automatically generated descriptor achieves 100% accuracy versus 80% blind (p=0.002, d=1.04), proving direct navigational value independent of developer self-clarification. Third, an observational field study across 7,012 Claude Code sessions shows 52% reduction in agent behavioral variance. A writer-side experiment (96 generation runs, 96 error injections) reveals critical failure mode differences: JSON fails atomically, YAML silently corrupts 50% of errors, S-expressions detect all structural completeness errors. We propose intent.lisp, an S-expression architecture descriptor, and open-source the Forge toolkit.
Introduction
The rapid evolution of artificial intelligence (AI) has led to unprecedented advancements in coding agents. However, one significant challenge remains: the inefficiency associated with undirected codebase exploration. This study investigates the potential of formal architecture descriptors to streamline navigation processes for AI coding agents, thereby enhancing their performance.
Study Overview
- Controlled Experiment: In the first study, we conducted a controlled experiment involving 24 code localization tasks across four different conditions. The results indicated that providing architecture context led to a notable reduction in navigation steps, with improvements ranging from 33% to 44%. Statistical analysis revealed a Wilcoxon p-value of 0.009 and a Cohen’s d of 0.92, indicating a strong effect size.
- Artifact-vs-Process Experiment: The second study focused on the effectiveness of automatically generated descriptors against blind attempts. With 15 tasks across three conditions, the automatically generated descriptor achieved an impressive 100% accuracy compared to the 80% accuracy of blind attempts, with a p-value of 0.002 and a Cohen’s d of 1.04, highlighting its significant navigational value.
- Observational Field Study: The third study involved an observational field study across 7,012 Claude Code sessions. Findings indicated a 52% reduction in agent behavioral variance, demonstrating that formal architecture descriptors contribute to improved consistency in agent performance.
Failure Mode Analysis
In a writer-side experiment involving 96 generation runs and 96 error injections, we analyzed critical failure modes of different descriptor formats. The results revealed striking differences:
- JSON: This format exhibited atomic failure, leading to complete breakdown in navigation when encountering errors.
- YAML: This format had a tendency to silently corrupt 50% of errors, resulting in significant navigational challenges.
- S-expressions: This format successfully detected all structural completeness errors, proving to be the most reliable for navigation.
Conclusion and Future Work
Based on the findings of these studies, we propose intent.lisp, an S-expression architecture descriptor designed to enhance the navigation capabilities of AI coding agents. Furthermore, we plan to open-source the Forge toolkit to facilitate further research and development in this area. By addressing navigational overhead, we aim to optimize the efficiency and effectiveness of AI coding agents in real-world applications.
