AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Models
Summary: arXiv:2604.08867v1 Announce Type: cross
Abstract
Audio has rapidly become a primary interface for foundation models, powering real-time voice assistants. Ensuring safety in audio systems is inherently more complex than just “unsafe text spoken aloud”: real-world risks can hinge on audio-native harmful sound events, speaker attributes (e.g., child voice), impersonation/voice-cloning misuse, and voice-content compositional harms, such as child voice plus sexual content. The nature of audio makes it challenging to develop comprehensive benchmarks or guardrails against this unique risk landscape.
Introduction
To close this gap, we conduct large-scale red teaming on audio systems, systematically uncover vulnerabilities in audio, and develop a comprehensive, policy-grounded audio risk taxonomy and AudioSafetyBench, the first policy-based audio safety benchmark across diverse threat models. This innovative approach addresses the growing need for effective audio safety measures in a world where audio interfaces are increasingly prevalent.
Key Features of AudioSafetyBench
- Support for Diverse Languages: AudioSafetyBench is designed to accommodate audio inputs in various languages, ensuring broad applicability and relevance in global contexts.
- Suspicious Voices: The benchmark includes mechanisms to evaluate suspicious voices, such as celebrity impersonations and child voices, which pose unique challenges in audio safety.
- Risky Voice-Content Combinations: It assesses the risks associated with harmful combinations of voice and content, such as the juxtaposition of child voices with inappropriate material.
- Non-Speech Sound Events: The framework also considers non-speech sound events, expanding the scope of audio safety beyond traditional speech analysis.
Introducing AudioGuard
To defend against the identified threats, we propose AudioGuard, a unified guardrail consisting of two main components:
- SoundGuard: This component focuses on waveform-level audio-native detection, providing a robust mechanism to identify harmful audio events at the source.
- ContentGuard: This policy-grounded component provides semantic protection, ensuring that the content delivered through audio systems adheres to established safety standards.
Performance and Results
Extensive experiments on AudioSafetyBench and four complementary benchmarks demonstrate that AudioGuard consistently improves guardrail accuracy over strong audio-LLM-based baselines with substantially lower latency. The results indicate that AudioGuard not only enhances the safety of audio interactions but does so in a manner that is both efficient and scalable.
Conclusion
As audio interfaces become integral to our daily lives, the imperative for comprehensive audio safety measures becomes ever more pressing. AudioGuard represents a significant advancement in tackling the complexities of audio safety, offering a structured approach to mitigate risks across various threat models. By implementing these innovative strategies, we can move towards a future where audio interactions are not only intuitive but also secure.
