Efficient Adversarial Training via Criticality-Aware Fine-Tuning
Summary: arXiv:2604.12780v1 Announce Type: cross
Abstract
Vision Transformer (ViT) models have achieved remarkable performance across various vision tasks, with scalability being a key advantage when applied to large datasets. This scalability enables ViT models to exhibit strong generalization capabilities. However, as the number of parameters increases, the robustness of ViT models to adversarial examples does not scale proportionally.
Introduction
Adversarial training (AT) stands as one of the most effective methods for enhancing the robustness of machine learning models, particularly in the context of computer vision. Despite the notable success of ViT models, their vulnerability to adversarial attacks remains a pressing concern. Traditional approaches to adversarial training typically involve fine-tuning the entire model, which can lead to prohibitively high computational costs, particularly when dealing with large architectures.
Proposed Method: Criticality-Aware Adversarial Training (CAAT)
In this paper, we introduce Criticality-Aware Adversarial Training (CAAT), a novel method designed to efficiently fine-tune only a small subset of parameters while maintaining robust performance against adversarial examples. The primary objective of CAAT is to allocate resources adaptively to parameters deemed most critical for ensuring robustness.
Key Features of CAAT
- Resource Allocation: CAAT identifies parameters that contribute significantly to adversarial robustness, allowing for targeted fine-tuning.
- Parameter-Efficient Fine-Tuning (PEFT): When the number of critical parameters exceeds a predefined threshold, CAAT employs PEFT strategies to robustly adjust the relevant weight matrices.
- Scalability: CAAT is designed to scale effectively with larger ViT architectures, making it a viable solution for adversarial training at scale.
Results and Findings
Our extensive experiments conducted on three widely used adversarial learning datasets reveal that CAAT outperforms state-of-the-art lightweight adversarial training methods, achieving comparable robustness with significantly fewer trainable parameters. Notably, CAAT incurs only a 4.3% decrease in adversarial robustness while fine-tuning approximately 6% of its parameters, demonstrating its efficiency.
Conclusion
The findings presented in this paper highlight the potential of CAAT to make adversarial training more feasible for large ViT architectures, addressing the computational limitations of traditional methods. By focusing on critical parameters, CAAT not only enhances robustness but also paves the way for further research in efficient adversarial training techniques. The implications of this work extend beyond ViTs, offering insights applicable to a broad spectrum of machine learning models.
