ScaleGNN: Scalable Mini-batch GNN Training with 4D Parallelism

Date:

Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training

Summary: arXiv:2604.02651v1 Announce Type: cross

Abstract: Graph neural networks (GNNs) are widely used for learning on graph datasets derived from various real-world scenarios. Learning from extremely large graphs requires distributed training, and mini-batching with sampling is a popular approach for parallelizing GNN training. Existing distributed mini-batch approaches have significant performance bottlenecks due to expensive sampling methods and limited scaling when using data parallelism.

In this work, we present ScaleGNN, a 4D parallel framework for scalable mini-batch GNN training that combines communication-free distributed sampling, 3D parallel matrix multiplication (PMM), and data parallelism. The innovations introduced by ScaleGNN address the limitations of current methodologies, enabling more efficient and scalable training of GNNs.

Key Innovations of ScaleGNN

  • Communication-free Distributed Sampling: ScaleGNN introduces a uniform vertex sampling algorithm, allowing each process (GPU device) to construct its local mini-batch, i.e., subgraph partitions without any inter-process communication. This innovation significantly reduces the overhead associated with traditional sampling methods.
  • 3D Parallel Matrix Multiplication (PMM): The framework employs a 3D PMM strategy, which enables the scaling of mini-batch training to larger GPU counts than conventional data parallelism, while also minimizing communication overheads.
  • Overlapping Sampling with Training: To further enhance efficiency, additional optimizations have been implemented to overlap the sampling process with training phases, thereby maximizing resource utilization.
  • Lower Precision Data Transmission: ScaleGNN reduces communication overhead by sending data in lower precision, which not only speeds up the process but also maintains the integrity of training outcomes.
  • Kernel Fusion: The framework integrates kernel fusion techniques to streamline operations, which contributes to the overall performance improvements.
  • Communication-Computation Overlap: By overlapping communication and computation tasks, ScaleGNN ensures that GPU resources are fully engaged, leading to enhanced throughput.

Performance Evaluation

ScaleGNN has been rigorously evaluated on five distinct graph datasets, demonstrating robust scalability. Testing was conducted across several high-performance computing environments:

  • 2048 GPUs on the Perlmutter system
  • 2048 GCDs on the Frontier system
  • 1024 GPUs on the Tuolumne system

Among these tests, ScaleGNN achieved a remarkable 3.5x end-to-end training speedup over the state-of-the-art (SOTA) baseline on the ogbn-products dataset. This significant improvement underscores the effectiveness of the proposed framework in handling large-scale GNN training tasks efficiently.

Conclusion

In summary, ScaleGNN represents a substantial advancement in the field of GNN training, addressing key bottlenecks associated with existing methods. Its innovative approaches to sampling, parallelism, and resource utilization pave the way for more scalable and efficient training of graph neural networks, making it a valuable contribution to the realm of machine learning and artificial intelligence.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.