Natural Language to PromQL: Framework for Cloud Observability

Date:

From Natural Language to PromQL: A Catalog-Driven Framework with Dynamic Temporal Resolution for Cloud-Native Observability

Summary: arXiv:2604.13048v1 Announce Type: cross

Abstract: Modern cloud-native platforms expose thousands of time series metrics through systems like Prometheus, yet formulating correct queries in domain-specific languages such as PromQL remains a significant barrier for platform engineers and site reliability teams. We present a catalog-driven framework that translates natural language questions into executable PromQL queries, bridging the gap between human intent and observability data.

Key Contributions

Our approach introduces three primary contributions to enhance cloud-native observability:

  • Hybrid Metrics Catalog:

    This catalog combines a statically curated base of approximately 2,000 metrics with runtime discovery of hardware-specific signals across various GPU vendors. This ensures a comprehensive and adaptable approach to metric management.

  • Multi-Stage Query Pipeline:

    The framework features an advanced multi-stage query pipeline that encompasses:

    • Intent classification to understand user queries.
    • Category-aware metric routing for efficient data retrieval.
    • Multi-dimensional semantic scoring to enhance query accuracy.
  • Dynamic Temporal Resolution:

    This mechanism interprets diverse natural language time expressions, mapping them to the appropriate PromQL duration syntax. It allows users to specify time ranges in a more intuitive manner.

Integration with the Model Context Protocol (MCP)

We have integrated the framework with the Model Context Protocol (MCP) to enable tool-augmented large language model (LLM) interactions across multiple providers. This integration facilitates a more seamless user experience when querying observability data.

Performance and Deployment

The catalog-driven approach achieves sub-second metric discovery through pre-computed category indices. The full query pipeline completes in approximately 1.1 seconds via the catalog path, providing rapid responses to user queries. The system has been deployed in production Kubernetes clusters managing AI inference workloads, demonstrating its capability to support natural language querying across approximately 2,000 metrics. These metrics encompass critical aspects such as:

  • Cluster health
  • GPU utilization
  • Model-serving performance

Conclusion

By bridging the gap between natural language and PromQL, this catalog-driven framework significantly reduces the barriers faced by platform engineers and site reliability teams. Its innovative design, combining a hybrid metrics catalog with a robust query pipeline and dynamic temporal resolution, empowers users to harness the full potential of observability data in cloud-native environments.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.