PrimeKG-CL: Benchmark for Continual Learning on Biomedical Graphs

PrimeKG-CL: A Continual Graph Learning Benchmark on Evolving Biomedical Knowledge Graphs

A recent study published on arXiv introduces PrimeKG-CL, a novel benchmark designed for continual graph learning (CGL) focused specifically on evolving biomedical knowledge graphs. As these graphs are pivotal for applications such as drug repurposing and clinical decision support, understanding how they evolve over time is essential for improving their utility and effectiveness.

Biomedical knowledge graphs depend on upstream ontologies that update on independent cycles, resulting in a significant influx of new data, including millions of edges and thousands of deprecated entries. Traditional approaches to continual graph learning have largely relied on synthetic and static knowledge graphs, failing to capture the complexities and asynchronous nature of real-world biomedical data evolution. The introduction of PrimeKG-CL aims to address these limitations.

Key Features of PrimeKG-CL

Comprehensive Data Sources: The benchmark is constructed from nine authoritative biomedical databases, encompassing over 129,000 nodes and more than 8.1 million edges.
Temporal Snapshots: PrimeKG-CL includes two genuine temporal snapshots from June 2021 and July 2023, offering insights into the dynamic changes in the biomedical knowledge landscape, with 5.83 million edges added and 889,000 removed during this period.
Diverse Task Categories: The benchmark features ten entity-type-grouped tasks, allowing for a broad evaluation of various graph learning methods.
Multimodal Node Features: It incorporates multimodal features to enhance the performance of entity-level tasks, showing improvements of up to 60%.
Stratified Test Set: Each task includes a persistent/added/removed test stratification, which allows for a more nuanced evaluation of continual learning strategies.

Evaluation and Findings

The research team conducted evaluations on three core tasks: biomedical relationship prediction, entity classification, and knowledge graph question answering (KGQA). They tested six continual learning strategies across four knowledge graph embedding (KGE) decoders, along with an LLM-RAG agent and CMKL.

One significant finding was the strong interaction between the choice of decoder and the continual learning strategy employed. The results indicated that no single strategy achieved optimal performance across all decoders, and mismatched combinations could severely hinder outcomes. Notably, only the DistMult decoder demonstrated a clear distinction between persistent and deprecated knowledge, highlighting an important issue in existing metrics that often conflate the retention of valid facts with the failure to forget outdated information. This problem was less pronounced with the RotatE decoder.

Challenges and Future Directions

Despite the promising results, the study faced challenges, particularly when testing a recent CKGE framework called IncDE, which struggled to scale to the 5.67 million triple base task, requiring excessive RAM during multiple attempts.

The data, pipeline, baselines, and stratified splits from this research have been made publicly available, encouraging further exploration and development in the field of continual graph learning in biomedical contexts. Researchers can access the dataset at huggingface.co/datasets/yradwan147/PrimeKGCL and the code at github.com/yradwan147/primekg-cl-neurips2026.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

PrimeKG-CL: Benchmark for Continual Learning on Biomedical Graphs

PrimeKG-CL: A Continual Graph Learning Benchmark on Evolving Biomedical Knowledge Graphs

Key Features of PrimeKG-CL

Evaluation and Findings

Challenges and Future Directions

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related