Evaluating LLMs for Competency Question Generation

Date:


Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed Models

Summary: arXiv:2604.16258v1 Announce Type: new

Abstract: Competency Questions (CQs) are a cornerstone of requirement elicitation in ontology engineering. CQs represent requirements as a set of natural language questions that an ontology should satisfy; they are traditionally modelled by ontology engineers together with domain experts as part of a human-centred, manual elicitation process. The use of Generative AI automates CQ creation at scale, therefore democratising the process of generation, widening stakeholder engagement, and ultimately broadening access to ontology engineering.

However, given the large and heterogeneous landscape of LLMs, varying in dimensions such as parameter scale, task and domain specialisation, and accessibility, it is crucial to characterise and understand the intrinsic, observable properties of the CQs they produce (e.g., readability, structural complexity) through a systematic, cross-domain analysis. This paper introduces a set of quantitative measures for the systematic comparison of CQs across multiple dimensions.

Using CQs generated from well-defined use cases and scenarios, we identify their salient properties, including:

  • Readability
  • Relevance with respect to the input text
  • Structural complexity of the generated questions

We conduct our experiments over a set of use cases and requirements using a range of LLMs, including both open models (KimiK2-1T, LLama3.1-8B, LLama3.2-3B) and closed models (Gemini 2.5 Pro, GPT 4.1). Our analysis demonstrates that LLM performance reflects distinct generation profiles shaped by the use case.

Introduction

Ontology engineering is a vital aspect of knowledge representation and management, and the role of Competency Questions in this field cannot be understated. Traditionally, creating CQs required significant collaboration between domain experts and ontology engineers, making the process time-consuming and resource-intensive.

With the advent of Generative AI, the potential for automating the generation of these questions has opened up new avenues for efficiency and accessibility. This study aims to systematically evaluate how different LLMs perform in generating CQs and the characteristics of the questions produced.

Methodology

We employed a diverse set of use cases to evaluate the performance of various LLMs. The models selected for analysis were chosen based on their availability and performance metrics. Each model was tasked with generating CQs based on a predefined set of requirements.

The quantitative measures used for analysis included:

  • Readability: Assessed using standard readability metrics to determine how easily the generated questions can be understood.
  • Relevance: Evaluated by comparing the generated questions to the input text to ensure they align with the intended requirements.
  • Structural Complexity: Measured by analysing the syntax and complexity of the generated questions.

Results and Discussion

The findings of our study indicate significant variations in the performance of the different LLMs based on the use cases they were applied to. Open models demonstrated a unique ability to generate more relevant and readable questions in specific contexts, while closed models excelled in structural complexity.

This research contributes to the understanding of how LLMs can be effectively leveraged in ontology engineering, providing valuable insights for future applications and developments in the field.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.