Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus
Summary: arXiv:2603.29292v1 Announce Type: cross
Abstract
Improving the code generation capabilities of large language models (LLMs) typically relies on supervised fine-tuning or preference optimization, both of which require costly external resources such as powerful teacher models or reliable test units. However, in real-world scenarios, it is much harder to obtain reference solutions and test oracles than problem descriptions and test inputs. In this paper, we tackle a challenging yet realistic question: Can a code language model improve itself without access to a superior teacher and a test oracle?
Introduction
To answer this, we propose ConSelf, a self-improving approach built upon two key ideas:
- Code Semantic Entropy: A novel metric that measures problem-level uncertainty by assessing the functional diversity of program behaviors. This allows for curriculum construction focused on the most learnable problems.
- Consensus-Driven Direct Preference Optimization (Con-DPO): A preference-based fine-tuning method that weights each preference pair based on its behavioral consensus, thereby mitigating the impact of noisy self-generated supervision.
Methodology
Our approach, ConSelf, is designed to enable large language models to enhance their code generation abilities autonomously. The process begins with the calculation of code semantic entropy, which identifies the uncertainty in problem-solving tasks by evaluating how diverse the outputs of the model can be in terms of program behavior.
This uncertainty metric informs a curriculum that prioritizes problems that the model can most effectively learn from, optimizing the learning process. The second component, Con-DPO, refines the model’s outputs by leveraging behavioral consensus among self-generated preferences. This method helps in reducing the noise typically associated with self-supervised learning, ensuring that the model relies more on reliable outputs.
Experimental Results
Experiments conducted on various benchmarks and using different backbone LLMs demonstrate the superiority of the ConSelf methodology. The results indicate that:
- ConSelf significantly outperforms traditional baselines in code generation tasks.
- The semantic entropy-based curriculum construction effectively enhances the learning experience by focusing on solvable problems.
- Consensus-driven optimization improves the quality of code outputs by filtering out less reliable self-generated preferences.
Conclusion
In conclusion, ConSelf represents a significant advancement in the self-improvement of code generation models. By integrating semantic entropy and behavioral consensus, LLMs can achieve enhanced performance without the need for external supervision. This innovative approach opens new avenues for autonomous learning in the field of code generation, making it a promising direction for future research.
Future Work
Future research may focus on refining the ConSelf methodology further and exploring its applicability to other domains beyond code generation. Additionally, investigating the interplay between semantic entropy and various model architectures could yield valuable insights into enhancing model performance across a broader spectrum of tasks.
