Echoes: A semantically-aligned music deepfake detection dataset
Summary: arXiv:2603.23667v1 Announce Type: cross
In recent years, the advent of artificial intelligence has revolutionized various fields, including music production. However, with the rise of AI-generated music comes the pressing need to detect deepfakes effectively. In response to this challenge, researchers have introduced Echoes, a novel dataset aimed at enhancing the detection of music deepfakes under realistic conditions.
Dataset Overview
Echoes is designed to provide a comprehensive framework for training and benchmarking detectors capable of identifying deepfake music. The dataset includes:
- 3,577 tracks
- 110 hours of audio
- Multiple genres, including pop, rock, and electronic
- Content generated by ten popular AI music generation systems
Key Features of Echoes
One of the most significant aspects of the Echoes dataset is its focus on semantic-level alignment. This approach aims to prevent shortcut learning, which can hinder the robustness and generalization of detection models. The dataset achieves this alignment through the following methods:
- Conditioning generated audio samples directly on bona-fide waveforms
- Utilizing song descriptors to enhance the authenticity of the generated content
Evaluation and Findings
The researchers evaluated the Echoes dataset in a cross-dataset setting against three existing AI-generated music datasets. They employed state-of-the-art Wav2Vec2 XLS-R 2B representations to assess the performance of their detection models. The results yielded several notable findings:
- Echoes is identified as the hardest in-domain dataset available.
- Detectors trained on existing datasets exhibited poor transferability to Echoes.
- Training on the Echoes dataset resulted in the strongest generalization performance for detection models.
Implications for Future Research
The findings from the Echoes dataset underscore the importance of provider diversity and semantic alignment in developing effective music deepfake detectors. As AI-generated music continues to evolve, the insights gained from Echoes could pave the way for more robust detection systems that can adapt to new challenges in the field.
Conclusion
With the introduction of Echoes, researchers are better equipped to tackle the complexities of music deepfake detection. By creating a dataset that emphasizes realistic conditions and semantic alignment, Echoes represents a significant advancement in the ongoing battle against AI-generated misinformation in the music industry.
