Can AI Tools Transform Low-Demand Math Tasks? An Evaluation of Task Modification Capabilities
Summary: arXiv:2604.12743v1 Announce Type: new
Abstract: While recent research has explored AI tools’ ability to classify the quality of mathematical tasks (arXiv:2603.03512), little is known about their capacity to increase the quality of existing tasks. This study investigated whether AI tools could successfully upgrade low-cognitive-demand mathematics tasks. Eleven tools were tested, including six broadly available, general-purpose AI tools (e.g., ChatGPT and Claude) and five tools specialized for mathematics teachers (e.g., Khanmigo, coteach.ai). Using the Task Analysis Guide framework (Stein & Smith, 1998), we prompted AI tools to modify two different types of low-demand mathematical tasks. The prompting strategy aimed to represent likely approaches taken by knowledgeable teachers, rather than extensive optimization to find a more effective prompt (i.e., an optimistic typical outcome).
Key Findings
On average, AI tools were only moderately successful in upgrading low-demand math tasks:
- Tasks were accurately upgraded only 64% of the time.
- Performance varied significantly among different AI tools, ranging from quite weak (33%) to broadly successful (88%).
- Specialized tools showed only moderate advantages over general-purpose tools.
Failure Modes
The study identified two primary failure modes in task modification:
- Undershooting: This occurs when the modified tasks maintain low cognitive demand, failing to elevate the task as intended.
- Overshooting: In this scenario, tasks are elevated to an overly ambitious target category, likely to be rejected by teachers.
Correlation Between Classification and Modification
Interestingly, the research revealed a small negative correlation (r = -.35) between the ability of an AI tool to correctly classify the cognitive demand of tasks and its success in upgrading those tasks. This finding suggests that the capability to modify tasks (a generative task) is distinct from the ability to classify them (a judgment using a rubric).
Implications for Curriculum Adaptation
These findings have significant implications for understanding AI’s potential role in curriculum adaptation. They highlight the need for specialized approaches that support teachers in modifying instructional materials effectively. As the educational landscape evolves, optimizing AI tools for specific pedagogical tasks can enhance their utility and relevance in the classroom.
Conclusion
The study underscores the promise and limitations of AI tools in enhancing low-demand math tasks. While there is potential for these tools to assist teachers in curriculum development, careful consideration must be given to their capabilities and limitations. Continued research is necessary to refine these tools and better understand how they can be integrated into teaching practices.
