When Does Data Augmentation Help? Evaluating LLM and Back-Translation Methods for Hausa and Fongbe NLP
Summary: Data scarcity limits NLP development for low-resource African languages. This study evaluates two data augmentation methods—LLM-based generation (Gemini 2.5 Flash) and back-translation (NLLB-200)—for Hausa and Fongbe, two West African languages that differ substantially in LLM generation quality.
Abstract
This research addresses the limitations imposed by data scarcity on Natural Language Processing (NLP) in low-resource African languages. We focus on two distinct data augmentation techniques: LLM-based generation and back-translation. Our evaluation concentrates on two specific languages, Hausa and Fongbe, which exhibit varying levels of quality in LLM generation. We assess the impact of these augmentation techniques on named entity recognition (NER) and part-of-speech (POS) tagging using the MasakhaNER 2.0 and MasakhaPOS benchmarks, respectively.
Key Findings
Our results indicate that the effectiveness of data augmentation is not solely determined by the quality of the LLM or the language itself, but is significantly influenced by the type of task being performed. Below are the main findings:
- For named entity recognition, neither augmentation method provided any improvement over the baseline for either language. Specifically, LLM augmentation resulted in a decline in performance with a 0.24% reduction in F1 score for Hausa and a 1.81% reduction for Fongbe.
- In the context of part-of-speech tagging, the results were more varied. LLM augmentation showed a minor improvement of 0.33% accuracy for Fongbe, while back-translation techniques enhanced Hausa performance by 0.17%. Conversely, back-translation led to a 0.35% decrease in Fongbe POS accuracy, showing negligible impact on Hausa.
- The same LLM-generated synthetic data produced contrasting effects across tasks for Fongbe—demonstrating a detrimental effect on NER while benefiting POS tagging. This suggests that the structure of the task plays a more critical role in determining augmentation outcomes than the quality of synthetic data.
Implications for Future Research
These findings challenge the prevailing assumption that high-quality LLM outputs guarantee successful data augmentation. Instead, they emphasize the necessity of treating data augmentation as a task-specific intervention. Researchers and practitioners should carefully consider the nature of the NLP tasks at hand before applying data augmentation techniques.
Conclusion
Ultimately, this study contributes to a deeper understanding of data augmentation in low-resource language contexts. By focusing on Hausa and Fongbe, we provide actionable insights that can inform future research and applications in NLP for African languages. Our results underscore the critical need for tailored approaches in data augmentation strategies, underscoring that what works for one task or language may not work for another.
