Ge2mS-T: Ultra-Efficient Spiking Transformer Grouping

Ge$^\text{2}$mS-T: Multi-Dimensional Grouping for Ultra-High Energy Efficiency in Spiking Transformer

Summary: arXiv:2604.08894v1 Announce Type: cross

Abstract: Spiking Neural Networks (SNNs) offer superior energy efficiency over Artificial Neural Networks (ANNs). However, they encounter significant deficiencies in training and inference metrics when applied to Spiking Vision Transformers (S-ViTs). Existing paradigms including ANN-SNN Conversion and Spatial-Temporal Backpropagation (STBP) suffer from inherent limitations, precluding concurrent optimization of memory, accuracy and energy consumption. To address these issues, we propose Ge$^\text{2}$mS-T, a novel architecture implementing grouped computation across temporal, spatial and network structure dimensions.

Specifically, we introduce the Grouped-Exponential-Coding-based IF (ExpG-IF) model, enabling lossless conversion with constant training overhead and precise regulation for spike patterns. Additionally, we develop Group-wise Spiking Self-Attention (GW-SSA) to reduce computational complexity via multi-scale token grouping and multiplication-free operations within a hybrid attention-convolution framework. Experiments confirm that our method can achieve superior performance with ultra-high energy efficiency on challenging benchmarks.

To our best knowledge, this is the first work to systematically establish multi-dimensional grouped computation for resolving the triad of memory overhead, learning capability and energy budget in S-ViTs.

Introduction

As the demand for energy-efficient AI models grows, Spiking Neural Networks (SNNs) have emerged as a promising alternative to traditional Artificial Neural Networks (ANNs). However, challenges remain, particularly in their application to Spiking Vision Transformers (S-ViTs). The need for effective training and inference metrics has become increasingly urgent.

Overview of Key Innovations

Grouped-Exponential-Coding-based IF (ExpG-IF) Model: This innovative model promotes lossless conversion while maintaining constant training overhead. The capacity for precise regulation of spike patterns is a significant advantage.
Group-wise Spiking Self-Attention (GW-SSA): This development focuses on reducing computational complexity. By utilizing multi-scale token grouping and multiplication-free operations, GW-SSA enhances performance within a hybrid attention-convolution framework.
Multi-Dimensional Grouped Computation: This is a systematic approach to simultaneously address memory overhead, learning capability, and energy budget, making it a groundbreaking advancement in the field.

Experimental Results and Performance

Our experiments demonstrate that the Ge$^\text{2}$mS-T architecture not only meets but exceeds expectations in terms of energy efficiency and performance. The benchmarks reveal a substantial improvement over existing models, underscoring the effectiveness of our grouped computation strategy.

Conclusion

In conclusion, the Ge$^\text{2}$mS-T architecture represents a pivotal advancement in the development of Spiking Vision Transformers. By implementing multi-dimensional grouped computation, we pave the way for future innovations in energy-efficient AI models. Our findings could inspire further research aimed at optimizing SNNs and enhancing their practical applications across various domains.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Ge2mS-T: Ultra-Efficient Spiking Transformer Grouping

Ge$^\text{2}$mS-T: Multi-Dimensional Grouping for Ultra-High Energy Efficiency in Spiking Transformer

Introduction

Overview of Key Innovations

Experimental Results and Performance

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related