Automating Database-Native Function Code Synthesis with LLMs
Summary: arXiv:2604.06231v1 Announce Type: cross
Abstract
Database systems are increasingly incorporating a multitude of functions into their kernels, also known as database native functions. These functions are essential for supporting new applications and facilitating business migrations. As the number of these functions grows, an urgent demand for automatic synthesis of database native functions has emerged.
Recent advancements in large language model (LLM)-based code generation, such as Claude Code, have demonstrated potential effectiveness. However, these models often remain too generic for the intricacies of database-specific development. They frequently hallucinate or overlook critical contextual details, leading to errors in function synthesis, which is inherently complex and error-prone. The synthesis of a single function may require registering multiple function units, linking internal references, and implementing logic accurately.
Introducing DBCooker
To address the challenges presented in database function synthesis, we propose DBCooker, an innovative LLM-based system designed for the automatic synthesis of database native functions. DBCooker comprises three key components:
- Function Characterization Module: This module aggregates multi-source declarations, identifies function units that require specialized coding, and traces cross-unit dependencies.
- Synthesis Operations: We have developed operations to tackle the primary synthesis challenges:
- Pseudo-code-based Coding Plan Generator: Constructs structured implementation skeletons by identifying key elements such as reusable referenced functions.
- Hybrid Fill-in-the-Blank Model: Guided by probabilistic priors and component awareness, this model integrates core logic with reusable routines.
- Three-Level Progressive Validation: Includes syntax checking, standards compliance, and LLM-guided semantic verification.
- Adaptive Orchestration Strategy: This strategy unifies the aforementioned operations with existing tools and dynamically sequences them based on the orchestration history of similar functions.
Performance Results
Initial results show that DBCooker significantly outperforms other existing methods when tested on popular database systems such as SQLite, PostgreSQL, and DuckDB. On average, DBCooker achieves an impressive 34.55% higher accuracy compared to its counterparts. Furthermore, it has demonstrated the capability to synthesize new functions that are absent in the latest version of SQLite (v3.50).
Conclusion
The development of DBCooker marks a significant advancement in the field of database function synthesis. By leveraging the power of LLMs while addressing the specific complexities of database systems, DBCooker not only enhances efficiency but also improves the accuracy of function synthesis. As the demand for more sophisticated database functions continues to grow, solutions like DBCooker will play a pivotal role in streamlining the development process.
