YMIR: A New Benchmark Dataset and Model for Arabic Yemeni Music Genre Classification Using Convolutional Neural Networks
Summary: arXiv:2604.05011v1 Announce Type: cross
Automatic music genre classification is a significant task within the field of music information retrieval. However, most existing benchmarks and models predominantly cater to Western music, leaving culturally specific traditions, such as Yemeni music, underrepresented. In response to this gap, the research introduces the Yemeni Music Information Retrieval (YMIR) dataset.
About the YMIR Dataset
The YMIR dataset consists of 1,475 meticulously selected audio clips that encompass five traditional Yemeni genres:
- Sanaani
- Hadhrami
- Lahji
- Tihami
- Adeni
Each audio clip in the dataset was labeled by five Yemeni music experts, utilizing a clear and structured protocol that resulted in strong inter-annotator agreement, indicated by a Fleiss kappa score of 0.85. This robust labeling process underscores the dataset’s reliability and cultural authenticity.
The Yemeni Music Classification Model (YMCM)
Alongside the dataset, the study proposes the Yemeni Music Classification Model (YMCM), which is a convolutional neural network (CNN)-based system specifically designed to classify music genres based on time-frequency features. To ensure consistency and reliability, a systematic preprocessing pipeline was applied throughout the experimental process.
Experimental Setup
The research involved a comprehensive comparison across six experimental groups and five different architectures, culminating in a total of 30 experiments. Various feature representations were evaluated, including:
- Mel-spectrograms
- Chroma
- FilterBank
- Mel-frequency cepstral coefficients (MFCCs) with 13, 20, and 40 coefficients
Additionally, the performance of the YMCM was benchmarked against standard models such as AlexNet, VGG16, MobileNet, and a baseline CNN, all under identical experimental conditions. This comprehensive approach allowed for a thorough assessment of model performance across different architectures and feature sets.
Key Findings
The experimental results demonstrated that the Yemeni Music Classification Model (YMCM) is the most effective model, achieving an impressive accuracy rate of 98.8% when utilizing Mel-spectrogram features. Furthermore, the outcomes provide valuable insights into the interplay between feature representation and model capacity, enhancing the understanding of music genre classification in the context of Yemeni traditions.
Conclusion
The findings from this research establish the YMIR dataset as a vital benchmark and the YMCM as a strong baseline for the classification of Yemeni music genres. This initiative not only bridges the gap in music information retrieval for culturally specific traditions but also opens avenues for further research and exploration in the field.
