Optimize autoregressive models with Reward Weighted Classifier-Free Guidance for faster policy improvement and flexible reward adaptation without retrainin...
Discover TeLAPA, a framework that preserves policy plasticity and boosts adaptation in continual reinforcement learning with diverse policy neighborhoods.