SISA: Efficient GEMM Acceleration with Scale-In Systolic Array

Date:

SISA: A Scale-In Systolic Array for GEMM Acceleration

Summary: arXiv:2603.29913v1 Announce Type: cross

Abstract: The currently dominant AI/ML workloads, such as Large Language Models (LLMs), rely on the efficient execution of General Matrix-Matrix Multiplication (GEMM) operations. Thus, most systems are equipped with dedicated matrix hardware accelerators based on square Systolic Arrays (SAs) of Processing Elements (PEs). While this organization was effective for traditional Deep Neural Networks (DNNs), LLMs introduce input-dependent and highly skewed matrices, leading to underutilized SA resources. To address this challenge, we propose SISA (Scale-In Systolic Array), a novel SA architecture that partitions the traditional square array into horizontal rectangular slabs. With minimal overhead, SISA exposes parallelism through independently scheduled slabs for efficient execution of small or skewed matrix shapes, while retaining full-array operation for large GEMMs. SISA achieves up to 8.52x speedup and 93% energy-delay-product (EDP) reduction for representative LLMs compared to a state-of-the-art monolithic SA with the same number of PEs.

Introduction

As artificial intelligence continues to evolve, the demand for efficient processing of workloads, particularly in the realm of machine learning, is paramount. Large Language Models (LLMs) have emerged as a focal point, often necessitating substantial computational resources due to their complexity. General Matrix-Matrix Multiplication (GEMM) serves as a fundamental operation in this domain, making the design of effective hardware accelerators essential.

Systolic Arrays and Their Limitations

Systolic Arrays (SAs) have been widely adopted for their efficiency in executing GEMM operations. However, the traditional square configuration of SAs poses challenges when dealing with the input-dependent and skewed matrices characteristic of LLMs. This misalignment can result in significant underutilization of processing elements (PEs), thereby diminishing overall performance and energy efficiency.

Introducing SISA

The Scale-In Systolic Array (SISA) architecture presents a solution to the limitations of conventional SAs. By partitioning the square array into horizontal rectangular slabs, SISA allows for more flexible scheduling of operations. This innovative design enables separate scheduling of slabs, thus facilitating the efficient execution of smaller or skewed matrix shapes while still supporting full-array operations for larger GEMM tasks.

Performance Metrics

The implementation of SISA has demonstrated remarkable performance improvements. Key metrics include:

  • Speedup: SISA achieves up to 8.52 times faster execution for representative LLMs compared to traditional monolithic SAs.
  • Energy-Delay Product (EDP) Reduction: A significant reduction of 93% in EDP has been observed, indicating enhanced energy efficiency without compromising performance.

Conclusion

In summary, the Scale-In Systolic Array (SISA) architecture offers a transformative approach to GEMM acceleration in the context of modern AI workloads. By effectively addressing the challenges posed by LLMs through innovative design and flexible scheduling, SISA not only enhances performance but also optimizes energy consumption. As the landscape of AI continues to evolve, architectures like SISA stand to play a critical role in shaping the future of hardware acceleration.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.