Explainability and Certification of AI-Generated Educational Assessments
The rapid adoption of generative artificial intelligence (AI) in educational assessment has created new opportunities for scalable item creation, personalized feedback, and efficient formative evaluation. However, despite advances in taxonomy alignment and automated question generation, the absence of transparent, explainable, and certifiable mechanisms limits institutional and accreditation-level acceptance.
This article explores a comprehensive framework for explainability and certification of AI-generated assessment items. It combines self-rationalization, attribution-based analysis, and post-hoc verification to produce interpretable cognitive-alignment evidence grounded in Bloom’s and SOLO taxonomies.
Key Components of the Proposed Framework
- Self-Rationalization: This process entails the AI generating explanations for its own assessments, thereby enhancing transparency and trust in the generated items.
- Attribution-Based Analysis: This analysis allows educators to understand how specific inputs lead to certain outputs, linking assessment items back to their foundational concepts.
- Post-Hoc Verification: This step involves reviewing and validating the AI-generated items after they have been created to ensure they meet educational standards.
Certification Metadata Schema
A structured certification metadata schema is introduced to capture critical information regarding AI-generated assessments. This schema includes:
- Provenance: Documenting the origin and development process of each assessment item.
- Alignment Predictions: Ensuring that items align with established educational standards and learning objectives.
- Reviewer Actions: Tracking the actions taken by reviewers during the certification process.
- Ethical Indicators: Assessing the ethical implications of the generated assessments.
Traffic-Light Certification Workflow
The framework also introduces a traffic-light certification workflow, which operationalizes these signals. This workflow distinguishes between:
- Auto-Certifiable Items: Assessment items that meet all necessary criteria without requiring human intervention.
- Items Requiring Human Review: Items that need further evaluation by educators or subject matter experts.
- Rejected Items: Items that do not meet the required standards and are eliminated from use.
Proof-of-Concept Study
A proof-of-concept study involving 500 AI-generated computer science questions demonstrates the framework’s feasibility. Results indicate improved transparency, reduced instructor workload, and enhanced auditability of assessment items.
Conclusion
The chapter concludes by outlining the ethical implications, policy considerations, and directions for future research. It positions explainability and certification as essential components of trustworthy, accreditation-ready AI assessment systems. As AI continues to evolve, establishing robust frameworks for transparency and accountability will be crucial in maintaining the integrity of educational assessments.
