Nemotron 3 Super: Efficient 120B Parameter AI Model

Date:

Nemotron 3 Super: A Breakthrough in AI Model Architecture

In the ever-evolving landscape of artificial intelligence, the introduction of the Nemotron 3 Super model marks a significant milestone. This innovative model, detailed in the recent pre-print arXiv:2604.12374v1, showcases a hybrid Mamba-Attention Mixture-of-Experts architecture designed for agentic reasoning. With an impressive 120 billion parameters, of which 12 billion are actively utilized, the Nemotron 3 Super is engineered for high efficiency and effectiveness in processing and generating natural language.

Key Features of Nemotron 3 Super

The Nemotron 3 Super model is distinguished by several groundbreaking features that enhance its performance and usability:

  • Pre-training in NVFP4: This is the first model in the Nemotron 3 family to be pre-trained using the NVFP4 framework, which optimizes the training process for better performance.
  • LatentMoE Architecture: The incorporation of LatentMoE represents a new Mixture-of-Experts architecture that focuses on maximizing accuracy per floating point operation (FLOP) and per parameter, thereby improving overall efficiency.
  • MTP Layers for Inference Acceleration: The model includes MTP (Multi-Task Predictive) layers that facilitate inference acceleration through native speculative decoding, allowing for quicker response times.

Training and Performance Metrics

The training regimen for Nemotron 3 Super involved an extensive pre-training phase on 25 trillion tokens, followed by post-training methods that included supervised fine-tuning (SFT) and reinforcement learning (RL). This comprehensive training approach ensures that the model is well-equipped to handle a wide array of tasks effectively.

Upon completion of its training, Nemotron 3 Super achieved remarkable performance metrics, supporting context lengths of up to 1 million tokens. Additionally, the model has demonstrated competitive accuracy on various common benchmarks within the field of AI language processing.

Increased Inference Throughput

One of the standout achievements of Nemotron 3 Super is its enhanced inference throughput. The model exhibits up to 2.2 times higher inference throughput compared to the GPT-OSS-120B, and an impressive 7.5 times higher throughput compared to Qwen3.5-122B. This substantial increase in efficiency positions Nemotron 3 Super as a leading contender in the realm of AI models, making it a valuable tool for developers and researchers alike.

Open Source Availability

In keeping with the spirit of collaboration and transparency in the AI community, the datasets used for training Nemotron 3 Super, along with the base, post-trained, and quantized checkpoints, are made available as open-source on HuggingFace. This initiative allows other researchers and developers to explore and build upon the capabilities of this advanced model, fostering further innovation in the field.

Conclusion

The Nemotron 3 Super model represents a significant advancement in the development of AI language models, combining cutting-edge architecture with robust training methodologies. Its open-source nature and impressive performance metrics pave the way for future research and applications in agentic reasoning and beyond.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.