Robust Multi-Agent Reinforcement Learning for Small UAS Separation Assurance under GPS Degradation and Spoofing
In recent years, the proliferation of small Unmanned Aircraft Systems (sUAS) has transformed the landscape of aerial operations. However, the reliance on Global Positioning System (GPS) for navigation and coordination poses significant vulnerabilities, particularly in scenarios where GPS signals are degraded or spoofed. A new research paper, titled “Robust Multi-Agent Reinforcement Learning for Small UAS Separation Assurance under GPS Degradation and Spoofing,” addresses these challenges through innovative application of Multi-Agent Reinforcement Learning (MARL).
Understanding the Challenge
The paper highlights that in cooperative surveillance systems, each sUAS broadcasts its GPS-derived position to maintain situational awareness. However, when these broadcasts are corrupted, the reliability of the entire air traffic state diminishes, potentially leading to safety hazards. This vulnerability is compounded in environments with high-density sUAS operations, where coordination and separation assurance are paramount.
Proposed Solution
The authors propose a novel approach that frames the problem of state observation corruption as a zero-sum game between the agents (sUAS) and an adversary attempting to disrupt their operations. The adversary has a probability R of perturbing the observed state in a manner that maximally degrades each agent’s safety performance. The researchers derive a closed-form expression for this adversarial perturbation, which allows for linear-time evaluation in the state dimension, thereby bypassing the need for extensive adversarial training.
Key Findings
The paper presents several critical findings:
- The derived expression for adversarial perturbation approximates the true worst-case scenario with second-order accuracy.
- The safety performance gap between clean and corrupted observations is bounded, demonstrating that performance degradation occurs at most linearly with the probability of corruption under Kullback-Leibler regularization.
- Integration of the closed-form adversarial policy into a MARL policy gradient algorithm results in a robust counter-policy for the agents, enhancing their resilience against GPS spoofing and degradation.
Simulation Results
In a series of high-density sUAS simulations, the proposed approach yielded remarkable results. The agents exhibited near-zero collision rates even under corruption levels of up to 35%. This stands in stark contrast to traditional policies trained without accounting for adversarial perturbations, which demonstrated significantly higher collision rates.
Conclusion
The research presents a significant advancement in ensuring the safety and reliability of sUAS operations in the face of GPS vulnerabilities. By leveraging Multi-Agent Reinforcement Learning to create robust counter-policies, the study not only enhances operational safety but also sets a precedent for future research in the field. The implications of this work extend beyond sUAS, potentially benefiting various domains where cooperative multi-agent systems are employed.
For further reading, the full paper is available on arXiv under the identifier arXiv:2603.28900v1.
