Natural Language to PromQL: Framework for Cloud Observability

From Natural Language to PromQL: A Catalog-Driven Framework with Dynamic Temporal Resolution for Cloud-Native Observability

Summary: arXiv:2604.13048v1 Announce Type: cross

Abstract: Modern cloud-native platforms expose thousands of time series metrics through systems like Prometheus, yet formulating correct queries in domain-specific languages such as PromQL remains a significant barrier for platform engineers and site reliability teams. We present a catalog-driven framework that translates natural language questions into executable PromQL queries, bridging the gap between human intent and observability data.

Key Contributions

Our approach introduces three primary contributions to enhance cloud-native observability:

Hybrid Metrics Catalog:

This catalog combines a statically curated base of approximately 2,000 metrics with runtime discovery of hardware-specific signals across various GPU vendors. This ensures a comprehensive and adaptable approach to metric management.
Multi-Stage Query Pipeline:

The framework features an advanced multi-stage query pipeline that encompasses:
- Intent classification to understand user queries.
- Category-aware metric routing for efficient data retrieval.
- Multi-dimensional semantic scoring to enhance query accuracy.
Dynamic Temporal Resolution:

This mechanism interprets diverse natural language time expressions, mapping them to the appropriate PromQL duration syntax. It allows users to specify time ranges in a more intuitive manner.

Integration with the Model Context Protocol (MCP)

We have integrated the framework with the Model Context Protocol (MCP) to enable tool-augmented large language model (LLM) interactions across multiple providers. This integration facilitates a more seamless user experience when querying observability data.

Performance and Deployment

The catalog-driven approach achieves sub-second metric discovery through pre-computed category indices. The full query pipeline completes in approximately 1.1 seconds via the catalog path, providing rapid responses to user queries. The system has been deployed in production Kubernetes clusters managing AI inference workloads, demonstrating its capability to support natural language querying across approximately 2,000 metrics. These metrics encompass critical aspects such as:

Cluster health
GPU utilization
Model-serving performance

Conclusion

By bridging the gap between natural language and PromQL, this catalog-driven framework significantly reduces the barriers faced by platform engineers and site reliability teams. Its innovative design, combining a hybrid metrics catalog with a robust query pipeline and dynamic temporal resolution, empowers users to harness the full potential of observability data in cloud-native environments.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Natural Language to PromQL: Framework for Cloud Observability

From Natural Language to PromQL: A Catalog-Driven Framework with Dynamic Temporal Resolution for Cloud-Native Observability

Key Contributions

Integration with the Model Context Protocol (MCP)

Performance and Deployment

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related