Dynamic Dual-Granularity Skill Bank for Agentic RL
Summary: arXiv:2603.28716v1 Announce Type: new
Abstract: Agentic reinforcement learning (RL) can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D2Skill, a dynamic dual-granularity skill bank for agentic RL that organizes reusable experience into task skills for high-level guidance and step skills for fine-grained decision support and error correction. D2Skill jointly trains the policy and skill bank through paired baseline and skill-injected rollouts under the same policy, using their performance gap to derive hindsight utility signals for both skill updating and policy optimization. Built entirely from training-time experience, the skill bank is continuously expanded through reflection and maintained with utility-aware retrieval and pruning. Experiments on ALFWorld and WebShop with Qwen2.5-7B-Instruct and Qwen3-4B-Instruct-2507 show that D2Skill consistently improves success rates over skill-free baselines by 10-20 points. Further ablations and analyses show that both dual-granularity skill modeling and dynamic skill maintenance are critical to these gains, while the learned skills exhibit higher utility, transfer across evaluation settings, and introduce only modest training overhead.
Introduction
Reinforcement learning has become a powerful framework for training agents to perform complex tasks through interaction with their environment. However, traditional methods often struggle to leverage past experiences effectively. The introduction of reusable skills can enhance learning efficiency, yet many existing techniques primarily focus on trajectory-level guidance without a robust skill memory system. This article presents D2Skill, a novel approach that addresses these limitations.
Key Features of D2Skill
- Dynamic Dual-Granularity Skill Organization: D2Skill classifies skills into two categories: task skills for overarching guidance and step skills for specific decision-making support.
- Joint Training Mechanism: The skill bank and the policy are trained together, allowing for mutual improvements through paired rollouts, which enhances the overall learning process.
- Utility-Aware Skill Management: The system continuously expands and updates the skill bank based on performance signals, ensuring that only the most useful skills are retained.
Methodology
In implementing D2Skill, researchers utilize a dual-granularity approach to skill modeling. Task skills provide a framework for the agent’s high-level objectives, while step skills offer detailed guidance for immediate actions. This structure allows agents to refine their decision-making processes continuously. The training methodology involves generating paired rollouts that incorporate both baseline policies and skill-enhanced actions, facilitating a comprehensive learning experience.
Experimental Results
The effectiveness of D2Skill was evaluated in environments such as ALFWorld and WebShop using advanced models like Qwen2.5-7B-Instruct and Qwen3-4B-Instruct-2507. Results demonstrate a significant improvement in success rates, with increases ranging from 10 to 20 percentage points over traditional skill-free baselines. Further investigations reveal that the dual-granularity framework and the dynamic nature of skill maintenance are pivotal to these enhancements.
Conclusion
D2Skill represents a significant advancement in agentic reinforcement learning by providing a structured, efficient approach to skill utilization. Its innovative design allows for better learning from experience, leading to improved performance in complex tasks. As the field of reinforcement learning continues to evolve, methodologies like D2Skill may play a crucial role in shaping the future of intelligent agent design.
