Evaluating LLMs for Competency Question Generation

Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed Models

Summary: arXiv:2604.16258v1 Announce Type: new

Abstract: Competency Questions (CQs) are a cornerstone of requirement elicitation in ontology engineering. CQs represent requirements as a set of natural language questions that an ontology should satisfy; they are traditionally modelled by ontology engineers together with domain experts as part of a human-centred, manual elicitation process. The use of Generative AI automates CQ creation at scale, therefore democratising the process of generation, widening stakeholder engagement, and ultimately broadening access to ontology engineering.

However, given the large and heterogeneous landscape of LLMs, varying in dimensions such as parameter scale, task and domain specialisation, and accessibility, it is crucial to characterise and understand the intrinsic, observable properties of the CQs they produce (e.g., readability, structural complexity) through a systematic, cross-domain analysis. This paper introduces a set of quantitative measures for the systematic comparison of CQs across multiple dimensions.

Using CQs generated from well-defined use cases and scenarios, we identify their salient properties, including:

Readability
Relevance with respect to the input text
Structural complexity of the generated questions

We conduct our experiments over a set of use cases and requirements using a range of LLMs, including both open models (KimiK2-1T, LLama3.1-8B, LLama3.2-3B) and closed models (Gemini 2.5 Pro, GPT 4.1). Our analysis demonstrates that LLM performance reflects distinct generation profiles shaped by the use case.

Introduction

Ontology engineering is a vital aspect of knowledge representation and management, and the role of Competency Questions in this field cannot be understated. Traditionally, creating CQs required significant collaboration between domain experts and ontology engineers, making the process time-consuming and resource-intensive.

With the advent of Generative AI, the potential for automating the generation of these questions has opened up new avenues for efficiency and accessibility. This study aims to systematically evaluate how different LLMs perform in generating CQs and the characteristics of the questions produced.

Methodology

We employed a diverse set of use cases to evaluate the performance of various LLMs. The models selected for analysis were chosen based on their availability and performance metrics. Each model was tasked with generating CQs based on a predefined set of requirements.

The quantitative measures used for analysis included:

Readability: Assessed using standard readability metrics to determine how easily the generated questions can be understood.
Relevance: Evaluated by comparing the generated questions to the input text to ensure they align with the intended requirements.
Structural Complexity: Measured by analysing the syntax and complexity of the generated questions.

Results and Discussion

The findings of our study indicate significant variations in the performance of the different LLMs based on the use cases they were applied to. Open models demonstrated a unique ability to generate more relevant and readable questions in specific contexts, while closed models excelled in structural complexity.

This research contributes to the understanding of how LLMs can be effectively leveraged in ontology engineering, providing valuable insights for future applications and developments in the field.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Evaluating LLMs for Competency Question Generation

Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed Models

Introduction

Methodology

Results and Discussion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related