PrimeKG-CL: Benchmark for Continual Learning on Biomedical Graphs

Date:

PrimeKG-CL: A Continual Graph Learning Benchmark on Evolving Biomedical Knowledge Graphs

A recent study published on arXiv introduces PrimeKG-CL, a novel benchmark designed for continual graph learning (CGL) focused specifically on evolving biomedical knowledge graphs. As these graphs are pivotal for applications such as drug repurposing and clinical decision support, understanding how they evolve over time is essential for improving their utility and effectiveness.

Biomedical knowledge graphs depend on upstream ontologies that update on independent cycles, resulting in a significant influx of new data, including millions of edges and thousands of deprecated entries. Traditional approaches to continual graph learning have largely relied on synthetic and static knowledge graphs, failing to capture the complexities and asynchronous nature of real-world biomedical data evolution. The introduction of PrimeKG-CL aims to address these limitations.

Key Features of PrimeKG-CL

  • Comprehensive Data Sources: The benchmark is constructed from nine authoritative biomedical databases, encompassing over 129,000 nodes and more than 8.1 million edges.
  • Temporal Snapshots: PrimeKG-CL includes two genuine temporal snapshots from June 2021 and July 2023, offering insights into the dynamic changes in the biomedical knowledge landscape, with 5.83 million edges added and 889,000 removed during this period.
  • Diverse Task Categories: The benchmark features ten entity-type-grouped tasks, allowing for a broad evaluation of various graph learning methods.
  • Multimodal Node Features: It incorporates multimodal features to enhance the performance of entity-level tasks, showing improvements of up to 60%.
  • Stratified Test Set: Each task includes a persistent/added/removed test stratification, which allows for a more nuanced evaluation of continual learning strategies.

Evaluation and Findings

The research team conducted evaluations on three core tasks: biomedical relationship prediction, entity classification, and knowledge graph question answering (KGQA). They tested six continual learning strategies across four knowledge graph embedding (KGE) decoders, along with an LLM-RAG agent and CMKL.

One significant finding was the strong interaction between the choice of decoder and the continual learning strategy employed. The results indicated that no single strategy achieved optimal performance across all decoders, and mismatched combinations could severely hinder outcomes. Notably, only the DistMult decoder demonstrated a clear distinction between persistent and deprecated knowledge, highlighting an important issue in existing metrics that often conflate the retention of valid facts with the failure to forget outdated information. This problem was less pronounced with the RotatE decoder.

Challenges and Future Directions

Despite the promising results, the study faced challenges, particularly when testing a recent CKGE framework called IncDE, which struggled to scale to the 5.67 million triple base task, requiring excessive RAM during multiple attempts.

The data, pipeline, baselines, and stratified splits from this research have been made publicly available, encouraging further exploration and development in the field of continual graph learning in biomedical contexts. Researchers can access the dataset at huggingface.co/datasets/yradwan147/PrimeKGCL and the code at github.com/yradwan147/primekg-cl-neurips2026.

Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.