D2Skill: Dynamic Dual-Granularity Skill Bank for RL

Date:

Dynamic Dual-Granularity Skill Bank for Agentic RL

Summary: arXiv:2603.28716v1 Announce Type: new

Abstract: Agentic reinforcement learning (RL) can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D2Skill, a dynamic dual-granularity skill bank for agentic RL that organizes reusable experience into task skills for high-level guidance and step skills for fine-grained decision support and error correction. D2Skill jointly trains the policy and skill bank through paired baseline and skill-injected rollouts under the same policy, using their performance gap to derive hindsight utility signals for both skill updating and policy optimization. Built entirely from training-time experience, the skill bank is continuously expanded through reflection and maintained with utility-aware retrieval and pruning. Experiments on ALFWorld and WebShop with Qwen2.5-7B-Instruct and Qwen3-4B-Instruct-2507 show that D2Skill consistently improves success rates over skill-free baselines by 10-20 points. Further ablations and analyses show that both dual-granularity skill modeling and dynamic skill maintenance are critical to these gains, while the learned skills exhibit higher utility, transfer across evaluation settings, and introduce only modest training overhead.

Introduction

Reinforcement learning has become a powerful framework for training agents to perform complex tasks through interaction with their environment. However, traditional methods often struggle to leverage past experiences effectively. The introduction of reusable skills can enhance learning efficiency, yet many existing techniques primarily focus on trajectory-level guidance without a robust skill memory system. This article presents D2Skill, a novel approach that addresses these limitations.

Key Features of D2Skill

  • Dynamic Dual-Granularity Skill Organization: D2Skill classifies skills into two categories: task skills for overarching guidance and step skills for specific decision-making support.
  • Joint Training Mechanism: The skill bank and the policy are trained together, allowing for mutual improvements through paired rollouts, which enhances the overall learning process.
  • Utility-Aware Skill Management: The system continuously expands and updates the skill bank based on performance signals, ensuring that only the most useful skills are retained.

Methodology

In implementing D2Skill, researchers utilize a dual-granularity approach to skill modeling. Task skills provide a framework for the agent’s high-level objectives, while step skills offer detailed guidance for immediate actions. This structure allows agents to refine their decision-making processes continuously. The training methodology involves generating paired rollouts that incorporate both baseline policies and skill-enhanced actions, facilitating a comprehensive learning experience.

Experimental Results

The effectiveness of D2Skill was evaluated in environments such as ALFWorld and WebShop using advanced models like Qwen2.5-7B-Instruct and Qwen3-4B-Instruct-2507. Results demonstrate a significant improvement in success rates, with increases ranging from 10 to 20 percentage points over traditional skill-free baselines. Further investigations reveal that the dual-granularity framework and the dynamic nature of skill maintenance are pivotal to these enhancements.

Conclusion

D2Skill represents a significant advancement in agentic reinforcement learning by providing a structured, efficient approach to skill utilization. Its innovative design allows for better learning from experience, leading to improved performance in complex tasks. As the field of reinforcement learning continues to evolve, methodologies like D2Skill may play a crucial role in shaping the future of intelligent agent design.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.