Safe-FedLLM: Delving into the Safety of Federated Large Language Models
Summary: arXiv:2601.07177v3 Announce Type: replace-cross
Abstract: Federated learning (FL) addresses privacy and data-silo issues in the training of large language models (LLMs). Most prior work focuses on improving the efficiency of federated learning for LLMs (FedLLM). However, security in open federated environments, particularly defenses against malicious clients, remains underexplored. To investigate the security of FedLLM, we conduct a preliminary study to analyze potential attack surfaces and defensive characteristics from the perspective of LoRA updates.
Our research identifies two key properties of FedLLM:
- LLMs are vulnerable to attacks from malicious clients in FL.
- LoRA updates exhibit distinct behavioral patterns that can be effectively distinguished by lightweight classifiers.
Based on these findings, we propose Safe-FedLLM, a probe-based defense framework for FedLLM. This framework constructs defenses across three levels:
- Step-Level: This level focuses on the immediate actions taken during the training process to identify and mitigate threats.
- Client-Level: This layer involves analyzing client behavior and interactions within the federated learning system to detect anomalies.
- Shadow-Level: This level encompasses a broader view of the system’s architecture, assessing overall security and resilience against potential threats.
The core concept of Safe-FedLLM is to perform probe-based discrimination on each client’s local LoRA updates. These updates are treated as high-dimensional behavioral features, which are then analyzed using a lightweight classifier to determine their potential malicious nature. Through extensive experiments, our results demonstrate that Safe-FedLLM significantly enhances FedLLM’s robustness against malicious clients while maintaining competitive performance on benign data.
One of the most notable advantages of our proposed method is its ability to suppress the impact of malicious data without significantly affecting the training speed. This is crucial in maintaining efficiency in a federated learning environment where time is often a critical factor. Additionally, Safe-FedLLM remains effective even under conditions with high ratios of malicious clients, showcasing its resilience and adaptability.
In conclusion, as the deployment of federated learning for large language models continues to grow, addressing security concerns becomes paramount. Safe-FedLLM offers a promising approach to enhance the safety of these systems, ensuring that the benefits of federated learning can be realized without compromising on security. Future work will focus on refining these defense mechanisms and exploring their applicability across various federated learning scenarios.
