KD-MARL: Resource-Aware Knowledge Distillation in Multi-Agent Reinforcement Learning
Summary: arXiv:2604.06691v1 Announce Type: new
Abstract: The real-world deployment of multi-agent reinforcement learning (MARL) systems faces significant constraints, particularly due to limited compute memory and inference time. While expert policies can achieve impressive performance levels, they often require costly decision cycles and large-scale models that are impractical for deployment on edge devices or embedded platforms. Knowledge distillation (KD) offers a promising solution for resource-aware execution; however, existing KD methodologies in MARL have largely focused on action imitation. This often neglects the coordination structure among agents and operates under the assumption that all agents have uniform capabilities.
To address these limitations, we propose Resource-Aware Knowledge Distillation for Multi-Agent Reinforcement Learning (KD-MARL), a two-stage framework designed to transfer coordinated behavior from a centralized expert to lightweight decentralized student agents. Our approach emphasizes the importance of preserving both action-level behaviors and structural coordination patterns from expert policies. This is accomplished by training the student policies without a critic, instead relying on distilled advantage signals and structured policy supervision. Such a strategy enables the retention of coordination under heterogeneous and limited observations.
Key Features of KD-MARL
- Resource Efficiency: The KD-MARL framework is designed to operate effectively on resource-constrained devices, making it suitable for practical deployment in real-world scenarios.
- Heterogeneous Agent Support: The method allows for the training of different agent architectures, enabling each agent model’s capacity to match its specific observation complexity.
- High Performance Retention: Extensive experiments conducted on SMAC and MPE benchmarks have shown that KD-MARL achieves over 90% of expert performance while significantly reducing computational costs.
- Structured Distillation: The use of structured distillation ensures that expert-level coordination is not only achieved but also preserved throughout the training process.
Experimental Results
Our experiments on several standard multi-agent benchmarks have demonstrated that KD-MARL can achieve remarkable results. For instance, it has been shown to reduce computational costs by up to 28.6 times in terms of floating point operations per second (FLOPs) while retaining the essential performance characteristics of the expert models. This demonstrates the feasibility of deploying advanced MARL systems on devices with limited computational resources without sacrificing performance.
Conclusion
In conclusion, KD-MARL offers a novel approach to overcoming the challenges associated with deploying multi-agent reinforcement learning systems in resource-constrained environments. By focusing on coordinated behavior transfer and efficient policy training, KD-MARL paves the way for practical applications of MARL across various domains, including robotics, autonomous systems, and real-time decision-making platforms. The proposed framework not only enhances the feasibility of MARL deployment but also sets the stage for future research in the area of knowledge distillation and reinforcement learning.
