Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents
Summary: arXiv:2604.05808v1 Announce Type: new
Abstract
Large language model (LLM) agents have demonstrated strong capabilities in complex interactive decision-making tasks. However, existing LLM agents typically rely on increasingly long interaction histories, resulting in high computational cost and limited scalability. In this paper, we propose STEP-HRL, a hierarchical reinforcement learning (HRL) framework that enables step-level learning by conditioning only on single-step transitions rather than full interaction histories.
Introduction
The advancement of large language models has made them pivotal in various applications ranging from natural language processing to interactive decision-making. Nevertheless, the traditional approach of utilizing extensive interaction histories poses challenges in computational efficiency and scalability. As tasks grow in complexity, reliance on long sequences of interactions can lead to bottlenecks that hinder performance.
Proposed Framework: STEP-HRL
STEP-HRL addresses these challenges by introducing a hierarchical structure for task management. This framework is designed to:
- Condition learning on single-step transitions instead of entire interaction histories.
- Structure tasks hierarchically, allowing completed subtasks to reflect the overall progress of a task.
- Implement a local progress module that iteratively summarizes interaction history within each subtask, yielding a compact representation of local progress.
Methodology
The core of STEP-HRL lies in its dual-level approach, which separates high-level and low-level policies. By focusing on augmented step-level transitions, both policies can operate more efficiently. This methodology allows for:
- Improved performance in decision-making tasks.
- Increased generalization capabilities across various environments.
- Significant reductions in token usage during interactions.
Experimental Results
Extensive experiments were conducted using two benchmark environments: ScienceWorld and ALFWorld. The results illustrate that STEP-HRL consistently outperforms baseline models. Key findings include:
- Enhanced decision-making performance compared to traditional LLM agents.
- Greater adaptability in diverse scenarios, showcasing improved generalization.
- Substantial reduction in the number of tokens utilized, leading to lower computational costs.
Conclusion
In summary, STEP-HRL represents a significant advancement in the field of hierarchical reinforcement learning for LLM agents. By focusing on step-level transitions and hierarchical task structuring, this framework not only enhances performance but also addresses critical limitations of existing methods. The introduction of a local progress module further enriches the learning process, making it more efficient and scalable.
For those interested in further exploration, the code for STEP-HRL is available at https://github.com/TonyStark042/STEP-HRL.
