PeopleSearchBench: Benchmarking AI People Search Platforms

Date:

PeopleSearchBench: A Multi-Dimensional Benchmark for Evaluating AI-Powered People Search Platforms

Summary: arXiv:2603.27476v1 Announce Type: new

AI-powered people search platforms have become integral tools in various sectors, including recruiting, sales prospecting, and professional networking. However, the absence of a universally accepted benchmark to evaluate their performance has limited the ability to assess and compare these platforms effectively. In response to this gap, researchers have introduced PeopleSearchBench, an innovative open-source benchmark designed to evaluate four prominent people search platforms using 119 real-world queries across four distinct use cases.

The four use cases examined in this benchmark include:

  • Corporate Recruiting
  • B2B Sales Prospecting
  • Expert Search with Deterministic Answers
  • Influencer/KOL Discovery

A significant contribution of PeopleSearchBench is the implementation of Criteria-Grounded Verification. This unique factual relevance pipeline is designed to extract explicit, verifiable criteria from each query and leverage live web search to assess whether the returned profiles meet these established criteria. This methodology provides binary relevance judgments that are rooted in factual verification, moving away from subjective evaluations typically associated with large language models evaluating holistic quality.

The evaluation of the systems is based on three critical dimensions:

  • Relevance Precision: Measured using padded nDCG@10.
  • Effective Coverage: This includes task completion rates and the yield of qualified results.
  • Information Utility: Assessed through profile completeness and overall usefulness of the information provided.

These three dimensions are averaged equally to produce an overall score for each system evaluated. Notably, Lessie, a specialized AI people search agent, emerged as the top performer in this benchmark, achieving an overall score of 65.2. This score is 18.5% higher than the second-ranked system and marks Lessie as the only platform to attain a remarkable 100% task completion rate across all 119 queries.

The study also includes rigorous reporting of confidence intervals, human validation of the verification pipeline (demonstrated by Cohen’s kappa coefficient of 0.84), and detailed ablation studies. Comprehensive documentation of queries, prompts, and normalization procedures is also provided, ensuring transparency and replicability of the results.

For those interested in exploring the benchmark further, all related code, query definitions, and aggregated results are readily available on GitHub, offering researchers and practitioners alike the opportunity to utilize and contribute to this vital resource in the field of AI-powered people search platforms.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.