A Benchmark for Gap and Overlap Analysis as a Test of KG Task Readiness
Summary: arXiv:2604.10853v1 Announce Type: new
Abstract
Task-oriented evaluation of knowledge graph (KG) quality increasingly asks whether an ontology-based representation can answer the competency questions that users actually care about, in a manner that is reproducible, explainable, and traceable to evidence. This paper adopts that perspective and focuses on gap and overlap analysis for policy-like documents (e.g., insurance contracts), where given a scenario, which documents support it (overlap) and which do not (gap), with defensible justifications.
The resulting gap/overlap determinations are typically driven by genuine differences in coverage and restrictions rather than missing data, making the task a direct test of KG task readiness rather than a test of missing facts or query expressiveness.
We present an executable and auditable benchmark that aligns natural-language contract text with a formal ontology and evidence-linked ground truth, enabling systematic comparison of methods. The benchmark includes:
- Ten simplified yet diverse life-insurance contracts reviewed by a domain expert.
- A domain ontology (TBox) with an instantiated knowledge base (ABox) populated from contract facts.
- Fifty-eight structured scenarios paired with SPARQL queries with contract-level outcomes and clause-level excerpts that justify each label.
Methodology and Findings
Using this resource, we compare a text-only LLM baseline that infers outcomes directly from contract text against an ontology-driven pipeline that answers the same scenarios over the instantiated KG. The results demonstrate that explicit modeling improves consistency and diagnosis for gap/overlap analyses.
Implications and Future Work
Although demonstrated for gap and overlap analysis, the benchmark is intended as a reusable template for evaluating KG quality and supporting downstream work such as ontology learning, KG population, and evidence-grounded question answering. This approach not only enhances the understanding of the coverage and limitations of knowledge graphs but also provides a structured methodology for improving the quality of knowledge representation in various application domains.
Conclusion
In conclusion, the benchmark proposed in this paper is a significant step forward in the evaluation of knowledge graphs. By focusing on gap and overlap analysis, we can better assess whether KGs are ready to meet the needs of users in real-world scenarios. This work lays the groundwork for future research and development in knowledge graph quality assessment and ontology-driven applications.
