AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering
In the rapidly evolving landscape of Autonomous Machine Learning Engineering (MLE), the need for agents that can effectively perform sustained, iterative optimization over extended periods is becoming increasingly critical. Recent advancements in large language model (LLM)-based agents have shown promise; however, the current prompt-based agents for MLE face significant limitations, particularly regarding behavioral stagnation attributed to frozen parameters.
To address these challenges, we introduce a novel approach known as AceGRPO, which encompasses two core innovations aimed at enhancing the efficiency and effectiveness of autonomous learning in MLE.
Key Components of AceGRPO
- Evolving Data Buffer: This component continuously repurposes execution traces into reusable training tasks. By doing so, it ensures that the learning agent is constantly exposed to diverse scenarios, thus avoiding stagnation and facilitating continuous learning.
- Adaptive Sampling: Guided by a Learnability Potential function, this feature dynamically prioritizes tasks that lie at the agent’s learning frontier. The goal is to maximize learning efficiency by focusing on the most valuable tasks that can significantly enhance the agent’s performance.
Performance and Results
The application of AceGRPO has led to the development of the Ace-30B model, which has demonstrated remarkable performance metrics. Specifically, the model achieved a 100% valid submission rate on the MLE-Bench-Lite benchmark, a significant indicator of its reliability and capability within the MLE domain. Furthermore, the Ace-30B model has shown performance levels that approach those of proprietary models at the frontier of machine learning technology.
Notably, Ace-30B outperforms larger open-source baselines, such as DeepSeek-V3.2, underscoring its robustness and effectiveness for sustained iterative optimization tasks. These results highlight the potential of AceGRPO to revolutionize the way autonomous agents are trained for complex machine learning tasks.
Conclusion
The introduction of AceGRPO marks a significant advancement in the field of Autonomous Machine Learning Engineering. By addressing critical shortcomings of existing models through innovative techniques like the Evolving Data Buffer and Adaptive Sampling, AceGRPO enhances the learning capabilities of agents, allowing for more efficient and effective optimization processes. For those interested in exploring the technical details and implementation of this framework, the code is available at https://github.com/yuzhu-cai/AceGRPO.
