SkillForge: Forging Domain-Specific, Self-Evolving Agent Skills in Cloud Technical Support
The deployment of large language model (LLM)-powered agents in enterprise environments, particularly in cloud technical support, necessitates the creation of high-quality, domain-specific skills. However, traditional skill creators often lack the necessary domain grounding, leading to the development of skills that are misaligned with actual task requirements. Additionally, once these skills are deployed, there is currently no systematic method to trace execution failures back to the deficiencies in these skills, resulting in stagnant quality despite the accumulation of operational evidence.
To tackle these challenges, the research introduces SkillForge, a self-evolving framework designed to close the end-to-end creation-evaluation-refinement loop. This innovative approach not only enhances the quality of skills but also ensures their continuous improvement over time.
Key Components of SkillForge
SkillForge operates through a sophisticated pipeline that includes several critical components:
- Domain-Contextualized Skill Creator: This component is pivotal in producing well-aligned initial skills. It grounds skill synthesis in comprehensive knowledge bases and historical support tickets, ensuring that the generated skills are closely tied to real-world requirements.
- Failure Analyzer: After deployment, this tool systematically analyzes execution failures in batch mode, identifying the specific areas where skills fall short.
- Skill Diagnostician: This component pinpoints the underlying deficiencies that lead to the identified failures, providing critical insights into the areas that require improvement.
- Skill Optimizer: Finally, the optimizer rewrites the skills to address the deficiencies discovered in the previous stages, effectively refining them for better performance.
Iterative Self-Optimization
The beauty of SkillForge lies in its iterative self-improvement cycle. The entire process—from skill creation to failure analysis and optimization—runs repeatedly, allowing skills to evolve and enhance their quality based on the feedback received from each round of deployment. This dynamic mechanism ensures that skills are not static but continually adapt and improve, leading to better performance over time.
Experimental Validation
The effectiveness of SkillForge has been evaluated through rigorous experiments across five real-world cloud support scenarios, encompassing a total of 1,883 tickets and 3,737 tasks. The results of these experiments reveal two significant findings:
- The Domain-Contextualized Skill Creator generates substantially superior initial skills compared to traditional generic skill creators. This is evidenced by the skills’ consistency with expert-authored reference responses derived from historical tickets.
- The self-evolution loop demonstrates a progressive enhancement in skill quality, regardless of the initial skill’s origin—whether it be expert-authored, domain-created, or generic. This finding underscores the potential of automated evolution to surpass even manually curated expert knowledge.
In conclusion, SkillForge represents a significant advancement in the realm of cloud technical support, providing a robust framework for developing and continuously enhancing domain-specific agent skills through a self-evolving process.
