Scalable Pretraining of Large MoE Language Models on Aurora

Date:

Scalable Pretraining of Large Mixture of Experts Language Models on Aurora Super Computer

In the rapidly evolving field of artificial intelligence, the pretraining of Large Language Models (LLMs) has emerged as a crucial endeavor. A recent study, documented in arXiv:2604.00785v1, highlights the advancements made in this domain using the Aurora supercomputer. This ExaScale machine, equipped with 127,488 Intel Ponte Vecchio (PVC) GPU tiles, facilitates unprecedented scaling of LLM training.

Introduction

The pretraining of LLMs from scratch demands an immense amount of computational resources. The Aurora supercomputer serves as a powerful platform for this task, enabling researchers to explore the scalability of their models. The study presented demonstrates the capabilities of the Aurora supercomputer in training models at the scale of thousands of GPU tiles.

Key Developments

Central to the research is the introduction of Optimus, an in-house training library that supports standard techniques for large model training. The researchers successfully pretrained multiple models, showcasing their effectiveness and the potential for future advancements.

  • Mula-1B: A 1 Billion dense model pretrained on 3072 GPU tiles using the full 4 trillion tokens of the OLMoE-mix-0924 dataset.
  • Mula-7B-A1B: A 7 Billion Mixture of Experts (MoE) model that was also pretrained from scratch on the same dataset.
  • Mula-20B-A2B, Mula-100B-A7B, and Mula-220B-A10B: Three large MoE models pretrained till 100 Billion tokens, demonstrating the scalability of the training process.

Computational Efficiency

The researchers successfully pushed the compute scaling of their largest model, Mula-220B-A10B, from 384 to 12,288 GPU tiles. This scaling effort yielded an impressive scaling efficiency of around 90% at the maximum GPU tile count. Such efficiency is critical as it allows for faster training times and more robust model performance.

Performance Improvements

Significant improvements in the runtime performance of MoE models were achieved through the development of custom GPU kernels designed for expert computation. Additionally, a novel EP-Aware sharded optimizer contributed to training speedups of up to 1.71 times. These enhancements are pivotal for maintaining competitive edge in model training.

Reliability and Fault Tolerance

As part of the Optimus library, the research team also focused on reliability and fault tolerance. The incorporation of robust features aimed at improving training stability and continuity at scale is essential for long-running training sessions, which are often susceptible to interruptions.

Conclusion

The study showcases the remarkable potential of the Aurora supercomputer for large-scale LLM pretraining. With the introduction of Optimus and the successful training of various models, the work sets a precedent for future research in the field of artificial intelligence and machine learning. As the demand for more powerful and efficient LLMs grows, such advancements will play a critical role in shaping the future of AI technologies.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.