HYVE: Hybrid Views for LLM Context Engineering over Machine Data
Summary: arXiv:2604.05400v1 Announce Type: new
Abstract: Machine data is central to observability and diagnosis in modern computing systems, appearing in logs, metrics, telemetry traces, and configuration snapshots. When provided to large language models (LLMs), this data typically arrives as a mixture of natural language and structured payloads such as JSON or Python/AST literals. Yet LLMs remain brittle on such inputs, particularly when they are long, deeply nested, and dominated by repetitive structure.
Introduction to HYVE
We present HYVE (HYbrid ViEw), a framework for LLM context engineering tailored for inputs containing large machine-data payloads. This innovative approach is inspired by database management principles and aims to enhance the interaction between LLMs and complex machine data.
Framework Overview
HYVE surrounds model invocation with coordinated preprocessing and postprocessing steps, centered on a request-scoped datastore augmented with schema information. The framework operates in two main phases:
- Preprocessing: During this phase, HYVE detects repetitive structures in raw inputs and materializes them within the datastore. It transforms the data into hybrid columnar and row-oriented views, selectively exposing only the most relevant representation to the LLM.
- Postprocessing: Following model invocation, HYVE can return the model output directly, query the datastore to recover omitted information, or perform an additional LLM call for SQL-augmented semantic synthesis.
Performance Evaluation
We evaluate the effectiveness of HYVE across diverse real-world workloads, which include:
- Knowledge Question Answering (QA)
- Chart Generation
- Anomaly Detection
- Multi-Step Network Troubleshooting
The results from these benchmarks reveal that HYVE significantly reduces token usage by 50-90% while either maintaining or improving the quality of output. Notably, in structured generation tasks, HYVE enhances chart-generation accuracy by up to 132% and reduces latency by as much as 83%.
Conclusion
Overall, HYVE presents a practical solution for effectively managing LLM context windows when dealing with large machine-data payloads. By integrating database management principles into LLM interactions, HYVE not only optimizes performance but also improves the quality of results, making it a valuable tool for modern computing systems.
