A Pythonic Functional Approach for Semantic Data Harmonisation in the ILIAD Project
Summary: arXiv:2604.13042v1 Announce Type: cross
Abstract
Semantic data harmonisation is a central requirement in the ILIAD project, where heterogeneous environmental data must be harmonised according to the Ocean Information Model (OIM), a modular family of ontologies for enabling the implementation of interoperable Digital Twins of the Ocean. Existing approaches to Semantic Data Harmonisation, such as RML and OTTR, offer valuable abstractions but require extensive knowledge of the technical intricacies of the OIM and the Semantic Web standards, including namespaces, IRIs, OWL constructors, and ontology design patterns.
Furthermore, RML and OTTR oblige practitioners to learn specialised syntaxes and dedicated tooling. Data scientists in ILIAD have found these approaches overly cumbersome and have therefore expressed the need for a solution that abstracts away these technical details while remaining seamlessly integrated into their Python-based environments. To address these requirements, we have developed a Pythonic functional approach to semantic data harmonisation that enables users to produce correct RDF through simple function calls.
Introduction
The ILIAD project aims to enhance the integration and usability of ocean data by providing a robust framework for data harmonisation. This task is critical for ensuring that data from various sources can be effectively utilized for research and practical applications.
Challenges in Existing Approaches
Current methodologies for semantic data harmonisation, while effective, present several challenges:
- Complexity: Understanding the underlying technicalities of the OIM and Semantic Web standards can be overwhelming for practitioners.
- Specialised Knowledge Required: Tools like RML and OTTR necessitate familiarity with specific syntaxes, which can deter participation from data scientists.
- Lack of Integration: Existing solutions often do not fit well within Python-centric workflows, limiting their usability in data science settings.
Our Solution
To overcome these challenges, we have designed a Pythonic functional approach that simplifies the process of semantic data harmonisation. This approach consists of a structured set of functions organized across multiple levels of abstraction:
- Low-level Functions: These expose OWL and RDF syntax directly, allowing for granular control over data representation.
- Mid-level Functions: Encapsulating ontology design patterns, these functions streamline the process of applying best practices in ontology design.
- High-level Domain-specific Functions: These orchestrate data harmonisation tasks by calling upon mid-level functions, enabling users to perform complex operations with minimal effort.
User Feedback and Impact
Feedback from ILIAD data scientists has indicated that this new approach significantly improves their ability to engage in data harmonisation activities. By abstracting technical details and providing a user-friendly interface, we have made it possible for more practitioners to contribute to the project effectively.
Conclusion
Our Pythonic functional approach to semantic data harmonisation represents a significant advancement in making environmental data more accessible and usable. As we continue to refine this solution, we aim to enhance collaborative efforts in the ILIAD project and beyond, fostering a more integrated understanding of ocean data across diverse scientific communities.
