IndiaFinBench: Benchmarking LLMs on Indian Finance Texts

Date:

IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text

Summary: arXiv:2604.19298v1 Announce Type: cross

Abstract

We introduce IndiaFinBench, the first publicly available evaluation benchmark specifically designed for assessing large language model (LLM) performance on Indian financial regulatory text. Traditional financial NLP benchmarks predominantly rely on Western financial corpora, such as SEC filings, US earnings reports, and English-language financial news. This focus leaves a significant gap in the coverage of non-Western regulatory frameworks, particularly those relevant to India.

Overview of IndiaFinBench

IndiaFinBench aims to fill this gap by providing a comprehensive dataset consisting of 406 expert-annotated question-answer pairs derived from 192 documents sourced from prominent Indian regulatory bodies, including the Securities and Exchange Board of India (SEBI) and the Reserve Bank of India (RBI). The benchmark encompasses four distinct task types:

  • Regulatory Interpretation: 174 items
  • Numerical Reasoning: 92 items
  • Contradiction Detection: 62 items
  • Temporal Reasoning: 78 items

Annotation Quality

The quality of the annotations has been rigorously validated. A model-based secondary pass achieved a kappa score of 0.918 on contradiction detection, showcasing high reliability in the data. Additionally, a human inter-annotator agreement evaluation involving 60 items yielded a kappa score of 0.611, with an overall agreement rate of 76.7%. This validation process ensures that the benchmark is both reliable and robust for evaluating LLMs.

Model Evaluation

To assess the efficacy of various LLMs, we evaluated twelve different models under zero-shot conditions. The accuracy of these models varied significantly, ranging from 70.4% for Gemma 4 E4B to an impressive 89.7% for Gemini 2.5 Flash. Notably, all models outperformed a non-specialist human baseline, which recorded an accuracy of merely 60.0%. This finding underlines the potential of LLMs in understanding complex financial regulatory texts.

Task Discrimination and Statistical Analysis

Among the different tasks, numerical reasoning emerged as the most discriminative, with a notable 35.9 percentage-point spread in performance across the evaluated models. To further validate these findings, we conducted bootstrap significance testing with 10,000 resamples, which identified three statistically distinct performance tiers among the models. This statistical rigor enhances the credibility of IndiaFinBench as a reliable benchmark for future research.

Availability

The dataset, evaluation code, and all model outputs related to IndiaFinBench are publicly available at the following link: https://github.com/rajveerpall/IndiaFinBench. Researchers and practitioners in the field of financial NLP are encouraged to utilize this resource to further advance the understanding and application of LLMs in the context of Indian financial regulations.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.