Advantage-Guided Diffusion for Model-Based Reinforcement Learning
Summary: arXiv:2604.09035v1 Announce Type: new
Abstract
Model-based reinforcement learning (MBRL) with autoregressive world models suffers from compounding errors, whereas diffusion world models mitigate this by generating trajectory segments jointly. However, existing diffusion guides are either policy-only, discarding value information, or reward-based, which becomes myopic when the diffusion horizon is short. We introduce Advantage-Guided Diffusion for MBRL (AGD-MBRL), which steers the reverse diffusion process using the agent’s advantage estimates so that sampling concentrates on trajectories expected to yield higher long-term return beyond the generated window.
Key Innovations
In our research, we develop two guides:
- Sigmoid Advantage Guidance (SAG)
- Exponential Advantage Guidance (EAG)
Theoretical Foundations
We prove that a diffusion model guided through SAG or EAG allows us to perform reweighted sampling of trajectories with weights increasing in state-action advantage, implying policy improvement under standard assumptions. This theoretical underpinning supports the efficacy of our approach.
Implementation and Integration
AGD integrates seamlessly with PolyGRAD-style architectures by guiding the state components while leaving action generation policy-conditioned. It requires no change to the diffusion training objective, making it a practical addition to existing frameworks.
Empirical Results
Our experiments on MuJoCo control tasks, including HalfCheetah, Hopper, Walker2D, and Reacher, demonstrate the effectiveness of AGD-MBRL. The results indicate:
- Improved sample efficiency compared to PolyGRAD.
- Higher final return over online Diffuser-style reward guides.
- Superior performance against model-free baselines such as PPO and TRPO, achieving improvements by margins of 2x in certain cases.
Conclusion
The findings from our study indicate that advantage-aware guidance is a simple yet effective remedy for short-horizon myopia in diffusion-model MBRL. By leveraging advantage estimates, AGD-MBRL enhances the trajectory generation process, ultimately leading to better performance in reinforcement learning tasks.
For further details, readers are encouraged to access the full paper on arXiv.
