The Specification Gap: Coordination Failure Under Partial Knowledge in Code Agents
Summary: arXiv:2603.24284v1 Announce Type: cross
Abstract: When multiple LLM-based code agents independently implement parts of the same class, they must agree on shared internal representations, even when the specification leaves those choices implicit. We study this coordination problem across 51 class-generation tasks, progressively stripping specification detail from full docstrings (L0) to bare signatures (L3), and introducing opposing structural biases (lists vs. dictionaries) to stress-test integration.
Key Findings
Our research yielded three significant findings regarding the coordination issues faced by code agents:
- Persistent Specification Gap: The integration accuracy between two agents diminishes significantly as the specification detail is reduced. Specifically, the accuracy drops from 58% to 25% when transitioning from detailed specifications to minimal ones. In contrast, the integration quality of a single agent decreases more gradually, from 89% to 56%. This creates a coordination gap of 25 to 39 percentage points, which remains consistent across two Claude models (Sonnet, Haiku) and through three independent runs.
- AST-based Conflict Detector: We developed an Abstract Syntax Tree (AST)-based conflict detection mechanism that achieves a remarkable 97% precision at the lowest specification level, without necessitating additional calls to LLMs. However, a factorial recovery experiment reveals that while restoring the complete specification can recover the single-agent accuracy ceiling (89%), the inclusion of conflict reports does not provide any measurable enhancement to performance.
- Decomposing the Coordination Gap: Our analysis indicates that the coordination gap can be divided into two distinct components: coordination cost (+16 percentage points) and information asymmetry (+11 percentage points). This suggests that these two factors are not only independent but also additive. Thus, the gap is not simply due to hidden information; it also reflects the inherent challenges of generating compatible code without shared decision-making.
Conclusion
These findings advocate for a specification-first approach in multi-agent code generation scenarios. Richer specifications serve as both the primary mechanism for coordination among agents and the necessary tool for recovery when discrepancies arise. The implications of this study stress the importance of detailed documentation and shared understanding among code agents, which can greatly enhance their collaborative efficiency and output quality.
As the field of AI and code generation continues to evolve, addressing these coordination challenges will be crucial for developing more effective and reliable multi-agent systems.
