CODE-GEN: A Human-in-the-Loop RAG-Based Agentic AI System for Multiple-Choice Question Generation
Recent advancements in artificial intelligence have opened new avenues for educational technology, particularly in the realm of automated question generation. A notable development in this field is CODE-GEN, a human-in-the-loop, retrieval-augmented generation (RAG)-based agentic AI system designed to create context-aligned multiple-choice questions aimed at enhancing student coding reasoning and comprehension abilities.
Overview of CODE-GEN
CODE-GEN utilizes a dual-agent framework comprising a Generator agent and a Validator agent. The Generator agent is tasked with producing multiple-choice questions that align with specific learning objectives of coding courses. Meanwhile, the Validator agent independently evaluates the quality of the content generated across seven distinct pedagogical dimensions, ensuring a comprehensive assessment of each question’s educational value.
Enhanced Computational Accuracy
To bolster the effectiveness of both agents, CODE-GEN is equipped with specialized tools that enhance computational accuracy and verify code outputs. This technological integration aims to alleviate common challenges associated with question generation, such as maintaining relevance and ensuring correctness in coding scenarios.
Evaluation Study
An evaluation study was conducted to assess the performance of CODE-GEN, which involved six human subject-matter experts (SMEs). The SMEs evaluated a total of 288 AI-generated questions, resulting in 2,016 human-AI rating pairs. These ratings indicated the level of agreement or disagreement with the Validator’s assessments, alongside 131 instances of qualitative feedback.
Performance Metrics
The analysis of SME judgments revealed impressive results, with human-validated success rates ranging from 79.9% to 98.6% across the seven pedagogical dimensions. This high level of agreement underscores the system’s potential for generating quality educational content.
Qualitative Feedback
Qualitative feedback from the SMEs indicated that CODE-GEN excels in dimensions that favor computational verification and explicit criteria matching. Key dimensions where the system demonstrated high reliability include:
- Question clarity
- Code validity
- Concept alignment
- Correct answer validity
However, the feedback also highlighted the continued importance of human expertise, particularly in areas that require deeper instructional judgment. For example, designing pedagogically meaningful distractors and providing high-quality feedback that reinforces understanding remain critical tasks that benefit from human insight.
Implications for AI-Assisted Education
The findings from the CODE-GEN evaluation study provide valuable insights into the strategic allocation of human and AI efforts in educational content generation. By leveraging the strengths of both human educators and AI systems, educational institutions can enhance the quality and effectiveness of learning materials, ultimately leading to improved student outcomes.
As the field of AI in education continues to evolve, systems like CODE-GEN represent a significant step forward in the integration of technology into pedagogical practices, offering promising solutions for the challenges faced by educators and learners alike.
