Can AI be a Teaching Partner? Evaluating ChatGPT, Gemini, and DeepSeek across Three Teaching Strategies
As the field of artificial intelligence continues to progress, the role of Large Language Models (LLMs) in education is becoming increasingly significant. Recent studies suggest that these models can enhance students’ learning experiences by providing tailored explanations, constructive feedback, and personalized guidance. However, despite their growing popularity and widespread usage, there remains a lack of empirical evidence that evaluates their effectiveness as teaching agents. This article delves into a comparative study involving three of the most well-known LLMs: ChatGPT, DeepSeek, and Gemini, specifically focusing on their pedagogical capabilities in teaching programming concepts.
Research Overview
The study presented here is detailed in the arXiv paper with the identifier: arXiv:2603.26673v1. The researchers aimed to create a systematic evaluation protocol that centers on three distinct pedagogical strategies:
- Examples: Using concrete instances to illustrate concepts.
- Explanations and Analogies: Clarifying complex topics through relatable comparisons and detailed descriptions.
- Socratic Method: Engaging students through questioning to stimulate critical thinking and dialogue.
Methodology
To assess the effectiveness of each LLM, six human judges were recruited to evaluate the models in the context of teaching the C programming language to novice learners. The judges utilized a set of criteria that focused on how well each model employed the aforementioned pedagogical strategies. The evaluations were designed to assess interaction patterns, responsiveness to prompts, and overall pedagogical effectiveness.
Key Findings
The study yielded several noteworthy findings:
- In the categories of Examples and Explanations and Analogies, all three models displayed similar interaction patterns, indicating a baseline level of competence in these strategies.
- When applying the Socratic Method, the models exhibited varying levels of sensitivity to both the pedagogical strategy and the initial prompts provided by the judges.
- ChatGPT and Gemini scored significantly higher overall compared to DeepSeek, suggesting that some models may be better suited for teaching roles than others.
Conclusion
The findings from this comparative study highlight the potential of LLMs as effective teaching partners in educational contexts, particularly in the realm of programming education. While ChatGPT and Gemini demonstrated stronger pedagogical performance, the results also emphasize the need for continued research to better understand the complexities and limitations of LLMs in educational settings. As technology evolves, further empirical studies will be crucial to refine these models and enhance their teaching capabilities.
In conclusion, while the promise of AI as a teaching partner is substantial, ongoing evaluations and enhancements will be key to realizing its full potential in the classroom.
