PIIBench: Unified Benchmark for PII Detection in Text

Date:

PIIBench: A Unified Multi-Source Benchmark Corpus for Personally Identifiable Information Detection

Summary: arXiv:2604.15776v1 Announce Type: cross

Abstract: We present PIIBench, a unified benchmark corpus for Personally Identifiable Information (PII) detection in natural language text. Existing resources for PII detection are fragmented across domain-specific corpora with mutually incompatible annotation schemes, preventing systematic comparison of detection systems.

Introduction

In recent years, the importance of detecting Personally Identifiable Information (PII) has surged due to increasing concerns about data privacy and security. However, the resources available for PII detection are often scattered and inconsistent, making it challenging for researchers and developers to effectively benchmark and evaluate their systems. To address this issue, we introduce PIIBench, a comprehensive benchmark corpus designed to facilitate the detection of PII across various domains.

Corpus Composition

PIIBench consolidates ten publicly available datasets, which include:

  • Synthetic PII corpora
  • Multilingual Named Entity Recognition (NER) benchmarks
  • Financial domain annotated text

This consolidation yields a substantial corpus comprising 2,369,883 annotated sequences and 3.35 million entity mentions across 48 canonical PII entity types. The diversity in the dataset allows for a more holistic evaluation of PII detection systems.

Normalization Pipeline

To ensure consistency across the dataset, we developed a principled normalization pipeline that includes:

  • Mapping 80+ source-specific label variants to a standardized BIO tagging scheme
  • Frequency-based suppression of near absent entity types
  • Stratified train/validation/test splits preserving source distribution

This pipeline not only standardizes the annotations but also enhances the reliability of the benchmark by maintaining the integrity of the data from various sources.

Evaluation of Detection Systems

To establish baseline difficulty, we evaluated eight published systems that encompass a range of methodologies, including:

  • Rule-based engines (Microsoft Presidio)
  • General purpose NER models (spaCy, BERT-base NER, XLM-RoBERTa NER, SpanMarker mBERT, SpanMarker BERT)
  • PII-specific models (Piiranha DeBERTa)
  • Financial NER specialists (XtremeDistil FiNER)

Despite the diversity in approaches, all evaluated systems achieved span-level F1 scores below 0.14, with the best-performing system, Microsoft Presidio, reaching an F1 score of 0.1385 but still exhibiting zero recall on most entity types. These results highlight the significant challenges posed by PII detection across various domains.

Conclusion

PIIBench presents a more comprehensive evaluation challenge than any existing single-source PII dataset, effectively quantifying the domain-silo problem in PII detection. The dataset construction pipeline and benchmark evaluation code are available publicly at https://github.com/pritesh-2711/pii-bench, encouraging further research and development in this critical area.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.