Reducing Gender Bias in Bangla NLP Classification Tasks

Mitigating Extrinsic Gender Bias for Bangla Classification Tasks

Summary: arXiv:2411.10636v2 Announce Type: replace-cross

Abstract: In this study, we investigate extrinsic gender bias in Bangla pretrained language models, a largely underexplored area in low-resource languages. To assess this bias, we construct four manually annotated, task-specific benchmark datasets for sentiment analysis, toxicity detection, hate speech detection, and sarcasm detection.

Each dataset is augmented using nuanced gender perturbations, where we systematically swap gendered names and terms while preserving semantic content, enabling minimal-pair evaluation of gender-driven prediction shifts. We then propose RandSymKL, a randomized debiasing strategy integrated with symmetric KL divergence and cross-entropy loss to mitigate the bias across task-specific pretrained models.

Introduction

Extrinsic gender bias in natural language processing (NLP) has gained increasing attention, particularly in high-resource languages. However, the challenge remains significantly underexplored in low-resource languages like Bangla. This study aims to bridge that gap by focusing on Bangla pretrained language models.

Methodology

Our approach involved the development of four benchmark datasets for various classification tasks:

Sentiment Analysis
Toxicity Detection
Hate Speech Detection
Sarcasm Detection

Each dataset was augmented with gender perturbations. This involved swapping gendered names and terms, allowing for a robust evaluation of how gender influences model predictions while maintaining the original semantic meaning.

Proposed Solution: RandSymKL

To combat the identified biases, we introduced RandSymKL, a novel debiasing strategy. This approach combines:

Randomized perturbation techniques
Symmetric KL divergence
Cross-entropy loss

By integrating these components, RandSymKL offers a unified method for reducing extrinsic gender bias in classification tasks. The methodology was carefully designed to ensure that the accuracy of the models is not compromised while effectively mitigating bias.

Evaluation and Results

The effectiveness of RandSymKL was rigorously tested against existing bias mitigation strategies. Our results demonstrated that:

RandSymKL significantly reduces extrinsic gender bias.
The performance metrics remained competitive when compared to baseline models.

This indicates that our proposed strategy not only addresses bias but also preserves the accuracy of the models in classification tasks.

Conclusion and Future Work

The findings of this study contribute to the ongoing discourse on bias in NLP, particularly in low-resource languages. By making our datasets and implementation publicly available at https://github.com/sajib-kumar/Mitigating-Bangla-Extrinsic-Gender-Bias, we aim to encourage further research in this critical area.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Reducing Gender Bias in Bangla NLP Classification Tasks

Mitigating Extrinsic Gender Bias for Bangla Classification Tasks

Introduction

Methodology

Proposed Solution: RandSymKL

Evaluation and Results

Conclusion and Future Work

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related