Nemotron-Cascade: Scalable Cascade RL for Reasoning Models

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Summary: arXiv:2512.13607v2 Announce Type: replace-cross

Building general-purpose reasoning models with reinforcement learning (RL) entails substantial cross-domain heterogeneity, including large variation in inference-time response lengths and verification latency. Such variability complicates the RL infrastructure, slows training, and makes training curriculum (e.g., response length extension) and hyperparameter selection challenging.

Introduction to Cascade RL

In this work, we propose cascaded domain-wise reinforcement learning (Cascade RL) to develop Nemotron-Cascade, capable of operating in both instruct and deep thinking modes, without any performance gap relative to a thinking-only counterpart. This innovative approach departs from conventional methodologies that blend heterogeneous prompts from different domains.

Key Features of Nemotron-Cascade

Sequential, Domain-Wise RL: Cascade RL orchestrates a sequential approach, focusing on domain-specific reinforcement learning that reduces engineering complexity.
State-of-the-Art Performance: The model delivers exceptional performance across a wide range of benchmarks, thanks to its specialized training approach.
Enhanced Reasoning Abilities: Utilizing Reinforcement Learning from Human Feedback (RLHF) for alignment as a pre-step significantly boosts the model’s reasoning capabilities beyond simple preference optimization.
Robust Performance Maintenance: Subsequent domain-wise RLVR stages rarely degrade the benchmark performance achieved in earlier domains and may even lead to improvements.

Performance Metrics

Our 14B model, after undergoing reinforcement learning, outperforms its supervised fine-tuning (SFT) teacher, DeepSeek-R1-0528, on various benchmarks including LiveCodeBench v5, v6, and Pro. Additionally, it has achieved silver-medal performance in the prestigious 2025 International Olympiad in Informatics (IOI).

Conclusion and Future Work

Nemotron-Cascade represents a significant advancement in the creation of general-purpose reasoning models through the implementation of Cascade RL. By addressing the complexities associated with diverse domains and response characteristics, this model sets a new standard for performance and efficiency in reinforcement learning applications.

We are committed to transparency in our research and development process and will be sharing our training recipes and data methodologies to foster further exploration and advancements in the field.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Nemotron-Cascade: Scalable Cascade RL for Reasoning Models

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Introduction to Cascade RL

Key Features of Nemotron-Cascade

Performance Metrics

Conclusion and Future Work

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related