Discover EVPO, an adaptive policy optimization method that improves LLM post-training by balancing critic use to reduce variance and boost performance.
Discover TeLAPA, a framework that preserves policy plasticity and boosts adaptation in continual reinforcement learning with diverse policy neighborhoods.