Can Large Language Models Reinvent Foundational Algorithms?
Summary: arXiv:2604.05716v1 Announce Type: new
Abstract: Large Language Models (LLMs) have shown strong potential to advance scientific discovery. Whether they possess the capacity for foundational innovation, however, remains an open question. In this work, we focus on a prerequisite for foundational innovation: can LLMs reinvent foundational algorithms in computer science?
Introduction
The advent of Large Language Models has transformed the landscape of artificial intelligence, enabling unprecedented capabilities in natural language understanding and generation. However, the question of whether these models can contribute to foundational innovations in fields such as computer science is still under investigation. This article explores this potential through a unique experimental framework.
The Unlearn-and-Reinvent Pipeline
Our research introduces the Unlearn-and-Reinvent pipeline, which focuses on the ability of LLMs to unlearn specific foundational algorithms, such as Dijkstra’s algorithm or Euclid’s algorithm, and subsequently reinvent them. This process involves:
- Applying LLM unlearning techniques to remove pre-existing knowledge of a foundational algorithm from the model.
- Testing the model’s capacity to reinvent the algorithm in a controlled environment.
- Utilizing a GRPO-based, on-policy unlearning method to facilitate effective unlearning.
Methodology
To evaluate the effectiveness of LLMs in reinventing foundational algorithms, we conducted experiments across 10 target algorithms, utilizing 3 strong open-weight models and varying hint levels. The hints provided to the models were categorized as follows:
- Hint Level 0: No hints provided.
- Hint Level 1: A few high-level hints.
- Hint Level 2: Step-by-step hints provided.
Results
The results of our experiments revealed notable insights:
- The strongest model, Qwen3-4B-Thinking-2507, successfully reinvented 50% of the algorithms with no hints, 70% at hint level 1, and 90% at hint level 2.
- While high-level hints improved the success rate, complex algorithms proved to be resistant even to step-by-step hints.
- Test-time reinforcement learning was particularly effective for the Strassen algorithm, achieving successful reinvention at hint level 2.
Discussion
Through analyses of output trajectories and ablation studies, we identified that the generative verifier plays a critical role in maintaining the reasoning capabilities of the models during the reinvention phase. This aspect is essential in preventing the “thought collapse” phenomenon, where models can lose coherence in their reasoning.
Conclusion
Our findings shed light on both the potential and limitations of LLMs in the realm of innovative thinking and foundational algorithm reinvention. While there is promising evidence of their capabilities, challenges remain, particularly with more complex algorithms. Future research will be crucial in exploring the boundaries of LLM innovation and enhancing their foundational contributions to computer science.
