GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning
Summary: arXiv:2604.02721v1 Announce Type: new
Abstract: Competitive programming remains one of the last few human strongholds in coding against AI. The best AI system to date still underperforms the best humans in competitive programming: the most recent best result, Google’s Gemini 3 Deep Think, attained 8th place even without being evaluated under live competition conditions. In this work, we introduce GrandCode, a multi-agent RL system designed for competitive programming.
Introduction to GrandCode
Competitive programming has long been seen as a domain where human intellect reigns supreme, often outsmarting any AI-driven solutions. However, recent developments have given rise to GrandCode, a groundbreaking multi-agent reinforcement learning (RL) system that has set new benchmarks in this field. The capability of GrandCode is attributed to two key factors:
- It orchestrates a variety of agentic modules (hypothesis proposal, solver, test generator, summarization, etc.) and jointly improves them through post-training and online test-time RL.
- We introduce Agentic GRPO specifically designed for multi-stage agent rollouts with delayed rewards and the severe off-policy drift that is prevalent in agentic RL.
Achievements of GrandCode
GrandCode has made history as the first AI system to consistently outperform all human participants in live contests of competitive programming. In a remarkable series of performances, GrandCode placed first in three consecutive Codeforces live competitions:
- Round 1087 (March 21, 2026)
- Round 1088 (March 28, 2026)
- Round 1089 (March 29, 2026)
In each of these rounds, GrandCode not only clinched the top position but also defeated legendary grandmasters, showcasing its superiority in tackling complex coding challenges. This advancement signals a significant shift in the competitive programming landscape, where AI has now reached a point where it can surpass even the most skilled human programmers.
The Future of Competitive Programming with AI
The introduction of GrandCode raises intriguing questions about the future of competitive programming. As AI systems continue to evolve, they are poised to redefine the boundaries of what is possible in coding competitions. With the integration of advanced technologies, such as Agentic GRPO, the potential for AI to enhance its performance through continuous learning and adaptation is immense.
Conclusion
GrandCode is not just a technological marvel; it represents a paradigm shift in the intersection of AI and competitive programming. By consistently outperforming human competitors, GrandCode paves the way for future exploration in AI capabilities, potentially transforming how coding challenges are approached and solved. As we look ahead, the implications of such advancements in AI will undoubtedly influence not only competitive programming but also the broader field of software development.
