Task-Centric Personalized Federated Fine-Tuning of Language Models
Summary: arXiv:2604.00050v1 Announce Type: cross
Introduction
As the field of Artificial Intelligence (AI) continues to evolve, Federated Learning (FL) has emerged as a significant technique for the training of language models on distributed and private datasets. This approach allows multiple clients to collaborate without exchanging their data, thus preserving privacy. However, while FL holds promise, it faces challenges, particularly when aggregated models trained on heterogeneous tasks result in degraded performance across individual clients.
The Challenge of Heterogeneous Tasks
In the context of language model training, clients often work with varying tasks, leading to inconsistencies in model performance. Traditional methods of Personalized Federated Learning (pFL) aim to create models tailored to each client’s unique data distribution. While these methods enhance local performance, they exhibit shortcomings in two critical areas:
- Generalization: Clients may need to make predictions on unseen tasks or adapt to shifts in their data distributions.
- Intra-client Task Interference: A single client’s dataset may contain multiple distributions that can interfere with each other during local model training.
Introducing FedRouter
To address these challenges, we introduce FedRouter, a novel clustering-based approach to pFL. Unlike conventional methods that develop models for each client, FedRouter focuses on creating specialized models for each task. This method utilizes adapters to personalize models through two distinct clustering mechanisms:
- Local Clustering: This mechanism associates adapters with specific task data samples, ensuring that each adapter is optimized for its corresponding task.
- Global Clustering: This approach links similar adapters from different clients to construct task-centric personalized models, enhancing collaboration and model efficacy.
Evaluation Router Mechanism
Additionally, we propose an evaluation router mechanism designed to route test samples to the most suitable adapter based on the established clusters. This adaptive routing system ensures that the model utilizes the best available resources for accurate predictions, streamlining the process of decision-making in diverse task environments.
Results and Performance
In comparative experiments against existing approaches across a multitask dataset, FedRouter has shown remarkable resilience in challenging scenarios. The results indicate that FedRouter performs up to 6.1% relatively better in environments with task interference and exhibits a staggering 136% relative improvement in generalization evaluations.
Conclusion
In summary, FedRouter represents a significant advancement in the field of Federated Learning, addressing critical issues of generalization and intra-client task interference. By focusing on task-centric models rather than client-centric ones, it paves the way for more robust and effective language model training across diverse datasets. As the demand for privacy-preserving AI solutions continues to grow, innovations like FedRouter are essential in shaping the future of personalized AI.
