Byzantine-Robust and Communication-Efficient Distributed Training: Compressive and Cyclic Gradient Coding
Summary: arXiv:2603.28780v1 Announce Type: cross
Abstract: In this paper, we study the problem of distributed training (DT) under Byzantine attacks with communication constraints. While prior work has developed various robust aggregation rules at the server to enhance robustness to Byzantine attacks, the existing methods suffer from a critical limitation in that the solution error does not diminish when the local gradients sent by different devices vary considerably, as a result of data heterogeneity among the subsets held by different devices.
Introduction
Distributed training has become increasingly important in machine learning, particularly in scenarios where data is distributed across multiple devices. However, one of the significant challenges in this domain is ensuring robustness against Byzantine attacks—malicious actions by some devices that can corrupt the training process. This paper introduces a novel approach to address these challenges through a method called cyclic gradient coding-based distributed training (LAD).
Challenges in Current Approaches
Existing methods to mitigate the effects of Byzantine attacks typically rely on robust aggregation rules. However, these methods have a key limitation:
- The solution error remains unchanged when local gradients from different devices differ significantly.
- This discrepancy often arises from data heterogeneity, where different devices hold varying subsets of data.
Proposed Solution: Cyclic Gradient Coding-Based Distributed Training (LAD)
The LAD method offers a fresh perspective on tackling Byzantine resilience in distributed training. Here’s an overview of how it works:
- Data Allocation: Before the training process begins, the server distributes the entire training dataset among the devices.
- Cyclic Gradient Coding: During each iteration, the server assigns computational tasks redundantly to the devices using cyclic gradient coding.
- Local Computation: Each honest device computes local gradients based on a fixed number of data subsets and encodes these gradients prior to transmission.
- Robust Aggregation: The server aggregates the vectors sent by honest devices alongside potentially corrupted messages from Byzantine devices utilizing a robust aggregation rule.
Analytical Characterization and Results
The convergence performance of LAD has been analytically characterized, revealing its enhanced robustness against Byzantine attacks and a significant reduction in solution error compared to existing methods. Furthermore, the paper introduces a communication-efficient variant of LAD, termed compressive and cyclic gradient coding-based distributed training (Com-LAD), designed to further minimize communication overhead in constrained environments.
Conclusion
Experimental results demonstrate the effectiveness of both LAD and Com-LAD in improving Byzantine resilience while also enhancing communication efficiency. These advancements mark a significant step forward in the realm of distributed training, providing a robust framework that can be applied in various real-world scenarios where data security and communication constraints are critical.
