From Plan to Action: How Well Do Agents Follow the Plan?
Summary: arXiv:2604.12147v1 Announce Type: cross
Abstract: Agents aspire to eliminate the need for task-specific prompt crafting through autonomous reason-act-observe loops. Still, they are commonly instructed to follow a task-specific plan for guidance, e.g., to resolve software issues following phases for navigation, reproduction, patch, and validation. Unfortunately, it is unknown to what extent agents actually follow such instructed plans.
Without such an analysis, determining the extent agents comply with a given plan is crucial. It is impossible to assess whether a solution was reached through correct strategic reasoning or through other means, such as data contamination or overfitting to a benchmark. This paper presents the first extensive, systematic analysis of plan compliance in programming agents, examining 16,991 trajectories from SWE-agent across four LLMs on SWE-bench Verified and SWE-bench Pro under eight plan variations.
Key Findings
The study reveals several significant insights into how agents adhere to their instructed plans:
- Without an explicit plan, agents tend to revert to workflows that were internalized during training, which are often incomplete or inconsistently applied.
- Providing a standard plan enhances issue resolution capabilities among agents.
- Periodic reminders of the plan can mitigate violations and enhance task success rates.
- A subpar plan can negatively impact performance more than having no plan at all.
- Surprisingly, augmenting a plan with additional task-relevant phases in the early stage can degrade performance, especially when these phases do not align with the model’s internal problem-solving strategy.
Implications for Future Research
These findings underscore a significant research gap in the development of programming agents. There is a pressing need for fine-tuning paradigms that focus on teaching models to diligently follow instructed plans. This approach diverges from merely encoding task-specific plans into the models, advocating instead for a focus on adaptive reasoning and action.
As agents evolve, the emphasis should be on enhancing their ability to reason and act adaptively rather than relying solely on memorized workflows. This shift could lead to the development of more robust, intelligent agents capable of navigating complex tasks with greater efficacy.
Conclusion
The systematic analysis of plan compliance in programming agents offers vital insights into their operational effectiveness. By understanding how agents interact with their instructed plans, researchers and developers can work towards creating more intelligent systems that not only perform tasks but do so with a strategic mindset.
