Cost-Aware Model Orchestration for LLM-based Systems
Summary: arXiv:2512.01099v2 Announce Type: replace
Abstract: As modern artificial intelligence (AI) systems become more advanced and capable, they can leverage a wide range of tools and models to perform complex tasks. The task of orchestrating these models is increasingly performed by Large Language Models (LLMs) that rely on qualitative descriptions of models for decision-making. However, the descriptions provided to existing LLM-based orchestrators frequently do not reflect true model capabilities and performance characteristics, leading to suboptimal model selection, reduced task accuracy, and increased cost.
In this paper, we conduct an empirical analysis of LLM-based orchestration limitations and propose a cost-aware model selection method that accounts for performance-cost trade-offs by incorporating quantitative model performance characteristics within decision-making. Initial experimental results demonstrate that our proposed method increases accuracy by 0.90%-11.92% across various evaluated tasks, achieves up to a 54% energy efficiency improvement, and reduces orchestrator model selection latency from 4.51 s to 7.2 ms.
Introduction
The rapid advancements in AI have opened new avenues for the orchestration of various models to enhance performance across diverse tasks. However, existing methodologies often fall short, primarily due to the reliance on qualitative descriptions that do not accurately represent model capabilities. This mismatch can result in inefficiencies and increased operational costs.
Challenges in LLM-based Orchestration
Current orchestration methods face several challenges:
- Inaccurate Model Descriptions: Many LLM-based systems utilize qualitative descriptions that do not encompass the full spectrum of model effectiveness.
- Suboptimal Model Selection: The inability to accurately assess model performance leads to poor decision-making in model selection.
- Increased Costs: Inefficient orchestration can escalate operational costs, negating the advantages offered by AI systems.
Proposed Cost-Aware Model Selection Method
To address the limitations of current LLM-based orchestration, we propose a novel cost-aware model selection method. This method integrates quantitative performance metrics into the decision-making process, allowing for a more accurate evaluation of model capabilities. Key features include:
- Performance-Cost Trade-offs: Our approach considers both the performance and cost implications of model selection, enabling more informed decisions.
- Empirical Validation: We conducted extensive experiments to validate our method, showcasing its effectiveness across various tasks.
Results and Discussion
The initial results from our experiments are promising:
- Accuracy improvements ranged from 0.90% to 11.92%.
- Energy efficiency enhancements reached up to 54%.
- Model selection latency was significantly reduced from 4.51 seconds to just 7.2 milliseconds.
These findings indicate that adopting a cost-aware approach not only enhances model selection efficiency but also optimizes resource utilization, ultimately leading to more robust AI systems.
Conclusion
As AI continues to evolve, the need for effective orchestration methods becomes ever more critical. Our proposed cost-aware model selection strategy addresses key limitations of existing LLM-based systems, paving the way for more accurate, efficient, and cost-effective AI solutions.
