From Natural Language to PromQL: A Catalog-Driven Framework with Dynamic Temporal Resolution for Cloud-Native Observability
Summary: arXiv:2604.13048v1 Announce Type: cross
Abstract: Modern cloud-native platforms expose thousands of time series metrics through systems like Prometheus, yet formulating correct queries in domain-specific languages such as PromQL remains a significant barrier for platform engineers and site reliability teams. We present a catalog-driven framework that translates natural language questions into executable PromQL queries, bridging the gap between human intent and observability data.
Key Contributions
Our approach introduces three primary contributions to enhance cloud-native observability:
-
Hybrid Metrics Catalog:
This catalog combines a statically curated base of approximately 2,000 metrics with runtime discovery of hardware-specific signals across various GPU vendors. This ensures a comprehensive and adaptable approach to metric management.
-
Multi-Stage Query Pipeline:
The framework features an advanced multi-stage query pipeline that encompasses:
- Intent classification to understand user queries.
- Category-aware metric routing for efficient data retrieval.
- Multi-dimensional semantic scoring to enhance query accuracy.
-
Dynamic Temporal Resolution:
This mechanism interprets diverse natural language time expressions, mapping them to the appropriate PromQL duration syntax. It allows users to specify time ranges in a more intuitive manner.
Integration with the Model Context Protocol (MCP)
We have integrated the framework with the Model Context Protocol (MCP) to enable tool-augmented large language model (LLM) interactions across multiple providers. This integration facilitates a more seamless user experience when querying observability data.
Performance and Deployment
The catalog-driven approach achieves sub-second metric discovery through pre-computed category indices. The full query pipeline completes in approximately 1.1 seconds via the catalog path, providing rapid responses to user queries. The system has been deployed in production Kubernetes clusters managing AI inference workloads, demonstrating its capability to support natural language querying across approximately 2,000 metrics. These metrics encompass critical aspects such as:
- Cluster health
- GPU utilization
- Model-serving performance
Conclusion
By bridging the gap between natural language and PromQL, this catalog-driven framework significantly reduces the barriers faced by platform engineers and site reliability teams. Its innovative design, combining a hybrid metrics catalog with a robust query pipeline and dynamic temporal resolution, empowers users to harness the full potential of observability data in cloud-native environments.
