Giving Voice to the Constitution: Low-Resource Text-to-Speech for Quechua and Spanish Using a Bilingual Legal Corpus
Summary: arXiv:2604.13288v1 Announce Type: cross
The latest research presents an innovative approach to synthesizing high-quality speech for the Peruvian Constitution in both Quechua and Spanish. This work utilizes three advanced text-to-speech (TTS) architectures—XTTS v2, F5-TTS, and DiFlow-TTS—to address the challenges associated with low-resource languages.
Abstract Overview
The study introduces a unified pipeline that enables the creation of speech outputs for the Peruvian Constitution, offering a significant step forward in bilingual legal communication. The models are specifically designed to handle the complexities of both Spanish and Quechua, which have varying dataset sizes and recording environments.
Methodology
The researchers employed a combination of independent speech datasets for both languages, which allowed them to leverage the strengths of bilingual and multilingual TTS capabilities. The approach focuses on:
- Training on diverse datasets that reflect the unique characteristics of each language.
- Implementing cross-lingual transfer techniques to enhance the synthesis quality of Quechua, which often suffers from data scarcity.
- Maintaining the naturalness and clarity of Spanish speech synthesis, ensuring accessibility for all users.
Impact on Indigenous and Multilingual Contexts
This project holds particular significance for indigenous communities and multilingual environments. By providing synthesized audio for each article of the Peruvian Constitution, the initiative aims to:
- Support legal literacy and awareness among Quechua speakers.
- Enhance the representation of indigenous languages in technological advancements.
- Facilitate better understanding of legal content among diverse populations.
Resource Availability
In an effort to promote further research and development in this field, the authors have made the following resources publicly available:
- Trained checkpoints for each TTS architecture.
- Inference code to allow researchers to implement and test the models.
- Synthesized audio files for each article of the Peruvian Constitution.
Conclusion
This groundbreaking work contributes significantly to the development of inclusive text-to-speech systems tailored for political and legal content in low-resource settings. By bridging the gap between technology and language accessibility, it empowers both Quechua and Spanish speakers to engage with their constitution in an innovative and meaningful way.
As the field of artificial intelligence continues to evolve, projects like this highlight the importance of addressing language diversity and the need for equitable access to information across different linguistic communities.
