EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience Learning
Summary: arXiv:2604.09815v1 Announce Type: new
Abstract
Computer-use agents that combine GUI interaction with structured API calls via the Model Context Protocol (MCP) show promise for automating software tasks. However, existing approaches lack a principled understanding of how agents should balance these two modalities and how to enable iterative self-improvement across diverse applications.
We formulate MCP-GUI interplay as a unified hybrid policy learning problem where the agent learns when each modality provides complementary advantages. Our findings indicate that distillation and experience augmentation target fundamentally different failure modes, necessitating application-aware mechanism selection.
Proposed Framework
Built on this formulation, we propose a self-evolving framework with a fully automatic pipeline that orchestrates the following:
- Automatic environment generation and validation
- Trajectory collection
- Gap-driven task synthesis
- Quality-filtered training – all without manual intervention
Key Innovations
A key innovation of our approach is the experience bank. This bank accumulates rules learned from large language models (LLMs) through trajectory comparison, which enables inference-time improvements without the need for fine-tuning.
Cross-Application Analysis
Our systematic cross-application analysis across three desktop applications reveals that the optimal strategy for agent performance depends on the MCP-GUI composition:
- Distillation achieves a 77.8% pass rate on MCP-dominant tasks, an improvement of 17.8 percentage points.
- The experience bank excels on GUI-intensive tasks, yielding an enhancement of 10.0 percentage points.
Conclusion
The research emphasizes the importance of recognizing the interplay between different modalities in software task automation. By integrating automated environment generation with experience learning, agents can enhance their performance iteratively across varying applications. The proposed self-evolving framework not only simplifies the process of training and improving agents but also demonstrates the potential for significant advancements in how automation tools can be developed and utilized in real-world scenarios.
