Reducing Gender Bias in Bangla NLP Classification Tasks

Date:

Mitigating Extrinsic Gender Bias for Bangla Classification Tasks

Summary: arXiv:2411.10636v2 Announce Type: replace-cross

Abstract: In this study, we investigate extrinsic gender bias in Bangla pretrained language models, a largely underexplored area in low-resource languages. To assess this bias, we construct four manually annotated, task-specific benchmark datasets for sentiment analysis, toxicity detection, hate speech detection, and sarcasm detection.

Each dataset is augmented using nuanced gender perturbations, where we systematically swap gendered names and terms while preserving semantic content, enabling minimal-pair evaluation of gender-driven prediction shifts. We then propose RandSymKL, a randomized debiasing strategy integrated with symmetric KL divergence and cross-entropy loss to mitigate the bias across task-specific pretrained models.

Introduction

Extrinsic gender bias in natural language processing (NLP) has gained increasing attention, particularly in high-resource languages. However, the challenge remains significantly underexplored in low-resource languages like Bangla. This study aims to bridge that gap by focusing on Bangla pretrained language models.

Methodology

Our approach involved the development of four benchmark datasets for various classification tasks:

  • Sentiment Analysis
  • Toxicity Detection
  • Hate Speech Detection
  • Sarcasm Detection

Each dataset was augmented with gender perturbations. This involved swapping gendered names and terms, allowing for a robust evaluation of how gender influences model predictions while maintaining the original semantic meaning.

Proposed Solution: RandSymKL

To combat the identified biases, we introduced RandSymKL, a novel debiasing strategy. This approach combines:

  • Randomized perturbation techniques
  • Symmetric KL divergence
  • Cross-entropy loss

By integrating these components, RandSymKL offers a unified method for reducing extrinsic gender bias in classification tasks. The methodology was carefully designed to ensure that the accuracy of the models is not compromised while effectively mitigating bias.

Evaluation and Results

The effectiveness of RandSymKL was rigorously tested against existing bias mitigation strategies. Our results demonstrated that:

  • RandSymKL significantly reduces extrinsic gender bias.
  • The performance metrics remained competitive when compared to baseline models.

This indicates that our proposed strategy not only addresses bias but also preserves the accuracy of the models in classification tasks.

Conclusion and Future Work

The findings of this study contribute to the ongoing discourse on bias in NLP, particularly in low-resource languages. By making our datasets and implementation publicly available at https://github.com/sajib-kumar/Mitigating-Bangla-Extrinsic-Gender-Bias, we aim to encourage further research in this critical area.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.