Streaming Model Cascades for Semantic SQL
In the realm of data management, the integration of semantic operators with SQL has become increasingly important. Modern data warehouses are leveraging large language models (LLMs) to enhance SQL capabilities. However, the per-row inference cost associated with these models can be prohibitively high, especially when dealing with large datasets. To mitigate these costs, a new approach called “model cascades” has been proposed, which involves routing most data rows through a fast proxy model while delegating uncertain cases to a more expensive oracle model.
The existing frameworks for model cascades, however, have limitations. They require global access to the dataset and typically optimize a single quality metric. This presents challenges in distributed systems where data is partitioned across independent workers. To address these issues, researchers have introduced two adaptive cascade algorithms aimed specifically at streaming, per-partition execution. This approach allows each worker to process its own partition independently without the need for inter-worker communication.
Introducing SUPG-IT and GAMCAL
The two algorithms presented are named SUPG-IT and GAMCAL. Here’s an overview of their key features:
- SUPG-IT: This algorithm extends the SUPG statistical framework to facilitate streaming execution. It utilizes iterative threshold refinement combined with joint precision-recall guarantees, allowing for effective processing of data in a distributed manner.
- GAMCAL: This innovative approach replaces the traditional user-specified quality targets with a learned calibration model. Specifically, a Generalized Additive Model (GAM) is employed to map proxy scores to calibrated probabilities with uncertainty quantification. This enables direct optimization of a cost-quality tradeoff through a single parameter, making it a versatile solution for various operational contexts.
Performance and Results
Experiments conducted on six distinct datasets within a production semantic SQL engine have demonstrated the efficacy of both algorithms. The findings indicate that both SUPG-IT and GAMCAL achieve an impressive F1 score greater than 0.95 across all datasets tested. Notably, GAMCAL exhibits superior performance in terms of F1 score per oracle call at cost-sensitive operating points. On the other hand, SUPG-IT is able to reach a higher overall quality ceiling, backed by formal guarantees regarding precision and recall.
Conclusion
The development of streaming model cascades represents a significant advancement in the field of semantic SQL and data processing. By allowing for distributed, independent execution of algorithms, both SUPG-IT and GAMCAL offer innovative solutions to the challenges posed by traditional frameworks. The ability to optimize for cost and quality simultaneously opens new avenues for efficiency in data management systems. As organizations continue to grapple with large datasets, these adaptive algorithms may prove essential in harnessing the full potential of semantic SQL.
