The Ideation Bottleneck: Decomposing the Quality Gap Between AI-Generated and Human Economics Research
Summary: arXiv:2604.03338v1 Announce Type: cross
Abstract: Autonomous AI systems can now generate complete economics research papers, but they substantially underperform human-authored publications in head-to-head comparisons. This paper decomposes the quality gap into two independent components: research idea quality and execution quality.
In a groundbreaking study, researchers have established a framework to analyze the disparities in quality between AI-generated and human-authored economics papers. Utilizing a sophisticated two-model ensemble of fine-tuned language models, they assessed the quality of research ideas and execution across a diverse dataset. The study specifically examined 953 economics papers, which included 912 AI-generated papers from the APE project and 41 human-authored papers published in reputable journals such as the American Economic Review and the AEJ: Economic Policy.
Key Findings
- Significant Idea Quality Gap: The study revealed a substantial difference in idea quality, with a Cohen’s d of 2.23 (p < 0.001). Human papers achieved an exceptional probability of 47.1%, compared to just 16.5% for AI-generated papers.
- Notable Execution Quality Gap: The execution quality gap, while significant, was less pronounced, with a Cohen’s d of 0.90 (p < 0.001). Human papers scored an average of 4.38 out of 5, whereas AI papers scored 3.84.
- Contribution to Overall Quality Difference: The analysis indicated that idea quality accounted for approximately 71% of the overall quality difference, whereas execution quality contributed 29%.
- Mechanism Analysis Weakness: The largest gap in execution quality was found in the depth of mechanism analysis, with a Cohen’s d of 1.43. No significant differences were identified regarding robustness.
- Methodological Trends in AI Papers: The study documented that 74% of AI-generated papers employed difference-in-differences as their primary methodology. Notably, only 7 AI papers (0.8%) surpassed the median human paper on both idea and execution quality simultaneously.
Conclusion
The findings of this research highlight that the primary bottleneck in producing competitive AI-generated economics research lies in the ideation phase. While execution capabilities are improving, the gap in the quality of research ideas remains a significant hurdle for AI systems. As the field of AI continues to evolve, addressing this ideation bottleneck may be crucial for the future advancement of AI in academic research.
