AdaQE-CG: Adaptive Query Expansion for Web-Scale Generative AI Model and Data Card Generation
Summary: arXiv:2604.09617v1 Announce Type: new
Abstract
Transparent and standardized documentation is essential for building trustworthy generative AI (GAI) systems. However, existing automated methods for generating model and data cards still face three major challenges:
- Static Templates: Most systems rely on fixed query templates that cannot adapt to diverse paper structures or evolving documentation requirements.
- Information Scarcity: Web-scale repositories such as Hugging Face often contain incomplete or inconsistent metadata, leading to missing or noisy information.
- Lack of Benchmarks: The absence of standardized datasets and evaluation protocols hinders fair and reproducible assessment of documentation quality.
Proposed Solution: AdaQE-CG
To address these limitations, we propose AdaQE-CG, an Adaptive Query Expansion for Card Generation framework that combines dynamic information extraction with cross-card knowledge transfer.
Key Components
AdaQE-CG comprises two significant modules:
- Intra-Paper Extraction via Context-Aware Query Expansion (IPE-QE): This module iteratively refines extraction queries to recover richer and more complete information from scientific papers and repositories.
- Inter-Card Completion using the MetaGAI Pool (ICC-MP): This module fills missing fields by transferring semantically relevant content from similar cards in a curated dataset.
Introduction of MetaGAI-Bench
In addition to AdaQE-CG, we introduce MetaGAI-Bench, the first large-scale, expert-annotated benchmark for evaluating GAI documentation. This benchmark is crucial for setting standards in documentation quality and ensuring that generated data cards meet the necessary criteria for transparency and consistency.
Results and Performance
Comprehensive experiments across five quality dimensions show that AdaQE-CG substantially outperforms existing approaches. It not only exceeds human-authored data cards but also approaches human-level quality for model cards. These results underscore the effectiveness of AdaQE-CG in enhancing the quality of documentation in the realm of generative AI.
Access to Resources
For those interested in exploring this innovative framework further, code, prompts, and data are publicly available at the following link: https://github.com/haoxuan-unt2024/AdaQE-CG.
