Generating Knowledge Graphs from Cultural Heritage Texts

Date:

Knowledge Graphs Generation from Cultural Heritage Texts: Combining LLMs and Ontological Engineering for Scholarly Debates

Summary: arXiv:2511.10354v1 Announce Type: cross

Abstract

Cultural Heritage texts contain rich knowledge that is difficult to query systematically due to the challenges of converting unstructured discourse into structured Knowledge Graphs (KGs). This paper introduces ATR4CH (Adaptive Text-to-RDF for Cultural Heritage), a systematic five-step methodology for Large Language Model-based Knowledge Extraction from Cultural Heritage documents. We validate the methodology through a case study on authenticity assessment debates.

Methodology

ATR4CH combines annotation models, ontological frameworks, and LLM-based extraction through iterative development. The five key steps of the methodology include:

  • Foundational Analysis: Understanding the context and content of Cultural Heritage texts.
  • Annotation Schema Development: Creating structured formats for data extraction.
  • Pipeline Architecture: Designing the workflow for data processing.
  • Integration Refinement: Improving the integration of various components.
  • Comprehensive Evaluation: Assessing the effectiveness of the methodology.

We demonstrate the approach using Wikipedia articles about disputed items (documents, artifacts, etc.), implementing a sequential pipeline with three LLMs: Claude Sonnet 3.7, Llama 3.3 70B, and GPT-4o-mini.

Findings

The methodology successfully extracts complex Cultural Heritage knowledge with impressive results:

  • Metadata Extraction: F1 score between 0.96-0.99
  • Entity Recognition: F1 score between 0.7-0.8
  • Hypothesis Extraction: F1 score between 0.65-0.75
  • Evidence Extraction: F1 score between 0.95-0.97
  • Discourse Representation: G-EVAL score of 0.62

Interestingly, smaller models performed competitively, enabling cost-effective deployment for institutions.

Originality

This research presents the first systematic methodology for coordinating LLM-based extraction with Cultural Heritage ontologies. ATR4CH provides a replicable framework that is adaptable across various Cultural Heritage domains and institutional resources.

Research Limitations

The produced Knowledge Graph is limited to Wikipedia articles. While the results are encouraging, human oversight is necessary during post-processing to ensure accuracy and completeness of the extracted knowledge.

Practical Implications

ATR4CH empowers Cultural Heritage institutions to systematically convert textual knowledge into queryable Knowledge Graphs, thereby supporting automated metadata enrichment and facilitating knowledge discovery across a wide array of applications.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.