Methods for Knowledge Graph Construction from Text Collections: Development and Applications
Summary: arXiv:2603.25862v1 Announce Type: cross
Abstract: Virtually every sector of society is experiencing a dramatic growth in the volume of unstructured textual data that is generated and published, from news and social media online interactions, through open access scholarly communications and observational data in the form of digital health records and online drug reviews. The volume and variety of data across all this range of domains has created both unprecedented opportunities and pressing challenges for extracting actionable knowledge for several application scenarios.
However, the extraction of rich semantic knowledge demands the deployment of scalable and flexible automatic methods adaptable across text genres and schema specifications. Moreover, the full potential of these data can only be unlocked by coupling information extraction methods with Semantic Web techniques for the construction of full-fledged Knowledge Graphs, that are semantically transparent, explainable by design and interoperable.
In this thesis, we experiment with the application of Natural Language Processing, Machine Learning and Generative AI methods, powered by Semantic Web best practices, to the automatic construction of Knowledge Graphs from large text corpora, in three use case applications:
- The analysis of the Digital Transformation discourse in the global news and social media platforms.
- The mapping and trend analysis of recent research in the Architecture, Engineering, Construction and Operations domain from a large corpus of publications.
- The generation of causal relation graphs of biomedical entities from electronic health records and patient-authored drug reviews.
The contributions of this thesis to the research community are significant. They include:
- Benchmark evaluation results that provide a solid foundation for future research.
- The design of customized algorithms tailored for specific Knowledge Graph construction tasks.
- The creation of data resources in the form of Knowledge Graphs that facilitate further research and application development.
- Data analysis results built on top of the constructed Knowledge Graphs, offering insights into various domains.
As the landscape of unstructured data continues to expand, the methodologies developed in this thesis serve as a crucial step towards leveraging such data effectively. By harnessing the power of advanced AI techniques and Semantic Web principles, the research paves the way for more robust and insightful Knowledge Graphs that can enhance decision-making processes across diverse sectors.
In conclusion, the integration of Natural Language Processing, Machine Learning, and Semantic Web technologies not only boosts the efficiency of Knowledge Graph construction but also enriches the semantic depth of the extracted knowledge, ultimately leading to more informed insights and applications in various fields.
