Enhancing Low-Resource Language Digital Representation with Knowledge Graphs

In Data or Invisible: Toward a Better Digital Representation of Low-Resource Languages with Knowledge Graphs

The rise of digital technologies has transformed how data is accessed and shared globally. However, this transformation has also highlighted a significant divide in Open Access Data (OAD) between high-resource and low-resource languages. A recent PhD proposal aims to bridge this gap by enhancing the language coverage of Linked Open Data knowledge graphs (LOD KGs).

Understanding the Divide

As language plays a crucial role in digital representation, the disparity in language resources can lead to the exclusion of numerous communities from participating in the global digital landscape. The proposed research focuses on identifying and analyzing key variables that characterize language distribution within LOD. These variables include:

Number of Wikipedia articles per language edition
Number of language-tagged entities in LOD KGs

By examining these variables across three major multilingual LOD KGs—DBpedia, BabelNet, and Wikidata—the research aims to provide deeper insights into the representation and distribution of languages within the LOD ecosystem.

Proposed Methodology

The research intends to build on the initial analysis by studying the impact of cross-lingual transfer candidate selection on the task of multilingual KG completion. This involves investigating strategies that leverage:

Linguistic proximity between languages
Availability of curated annotated alignments between languages

These strategies aim to enhance the performance of knowledge graphs and improve the representation of low-resource languages. By utilizing linguistic proximity, the proposal seeks to explore the advantages of analogical reasoning, which relies on the (dis)similarities between languages—a method that has not yet been thoroughly investigated to identify correspondences across languages.

Potential Impact on Low-Resource Languages

The implications of this research are profound. By improving the digital representation of low-resource languages, the project aims to foster greater inclusivity in the global digital transformation. Enhanced language coverage in LOD not only benefits speakers of these languages but also enriches the knowledge graphs themselves, leading to a more diverse and representative digital landscape.

Furthermore, as digital technologies continue to evolve, addressing the needs of low-resource languages through advanced methodologies in knowledge graph construction and completion could pave the way for more equitable access to information and resources. The research underscores the importance of inclusivity in the digital age, emphasizing that every language and its speakers deserve representation in the vast digital universe.

Conclusion

The proposed PhD research represents a critical step in addressing the digital divide faced by low-resource languages. By leveraging knowledge graphs and focusing on linguistic strategies, this work promises to enhance language representation in OAD, fostering a more inclusive digital future. As the project unfolds, the insights gained will be essential for shaping data accessibility and representation in a rapidly digitizing world.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Enhancing Low-Resource Language Digital Representation with Knowledge Graphs

In Data or Invisible: Toward a Better Digital Representation of Low-Resource Languages with Knowledge Graphs

Understanding the Divide

Proposed Methodology

Potential Impact on Low-Resource Languages

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related