RAMP: Hybrid DRL for Online Learning of Numeric Action Models
Summary: arXiv:2604.08685v1 Announce Type: new
Abstract: Automated planning algorithms require an action model specifying the preconditions and effects of each action, but obtaining such a model is often hard. Learning action models from observations is feasible, but existing algorithms for numeric domains are offline, requiring expert traces as input. We propose the Reinforcement learning, Action Model learning, and Planning (RAMP) strategy for learning numeric planning action models online via interactions with the environment.
Introduction
The development of automated planning algorithms has become increasingly vital across various fields, including robotics, artificial intelligence, and operations research. A significant challenge lies in acquiring action models, which define the necessary preconditions and expected effects for each action. Traditionally, this task involves complex offline processes that demand expert-generated data, which can be both time-consuming and impractical.
The RAMP Strategy
The RAMP framework introduces a novel approach to address these challenges by enabling the online learning of numeric action models through direct interaction with the environment. This hybrid strategy incorporates three primary components:
- Deep Reinforcement Learning (DRL) Policy: RAMP simultaneously trains a DRL policy that learns optimal actions based on feedback from the environment.
- Numeric Action Model Learning: The system learns a numeric action model that captures the relationships between actions, preconditions, and outcomes based on past interactions.
- Planning: RAMP utilizes the learned action model to generate plans for future actions, optimizing the performance of the RL policy.
Positive Feedback Loop
One of the significant advantages of the RAMP framework is the creation of a positive feedback loop. As the DRL policy gathers data from the environment, this information refines the action model. In turn, the enhanced model supports the planner in generating more effective plans, which further aids the RL policy in its training. This cyclical process enhances the learning efficiency and effectiveness of the system.
Numeric PDDLGym Framework
To facilitate the integration of reinforcement learning and numeric planning, the RAMP framework includes the Numeric PDDLGym, an automated environment designed to convert numeric planning problems into Gym environments. This framework allows researchers and practitioners to leverage existing RL tools while addressing the specific needs of numeric action models.
Experimental Results
In recent experiments conducted on standard IPC numeric domains, RAMP demonstrated significant advantages over traditional DRL algorithms such as Proximal Policy Optimization (PPO). The results indicated that RAMP not only improved the solvability of planning problems but also enhanced the quality of generated plans. This underscores the potential of RAMP to revolutionize the field of automated planning by making it more adaptable and efficient.
Conclusion
The introduction of the RAMP framework marks a significant advancement in the online learning of numeric action models. By combining reinforcement learning with planning capabilities, RAMP addresses the limitations of existing offline algorithms, paving the way for more dynamic and effective automated planning solutions. As research continues, the implications of RAMP could extend far beyond numeric domains, influencing various applications in artificial intelligence and robotics.
