Instruction-Tuned LLMs for HPC Log Parsing & Mining

Instruction-Tuned LLMs for Parsing and Mining Unstructured Logs on Leadership HPC Systems

Summary: arXiv:2604.05168v1 Announce Type: new

Abstract

Leadership-class HPC systems generate massive volumes of heterogeneous, largely unstructured system logs. Because these logs originate from diverse software, hardware, and runtime layers, they exhibit inconsistent formats, making structure extraction and pattern discovery extremely challenging. Therefore, robust log parsing and mining is critical to transform this raw telemetry into actionable insights that reveal operational patterns, diagnose anomalies, and enable reliable, efficient, and scalable system analysis. Recent advances in large language models (LLMs) offer a promising new direction for automated log understanding in leadership-class HPC environments.

Introduction

In the context of high-performance computing (HPC), the need for effective log analysis cannot be overstated. As systems grow in complexity and scale, so too do the logs they generate. These logs, while rich in information, can be overwhelming due to their unstructured nature. Traditional methods of log analysis often fall short, prompting researchers to explore innovative solutions.

Proposed Framework

To capitalize on the recent advancements in AI, we present a domain-adapted, instruction-following, LLM-driven framework. This framework leverages chain-of-thought (CoT) reasoning to parse and structure HPC logs with high fidelity. Our approach combines domain-specific log-template data with instruction-tuned examples to fine-tune an 8B-parameter LLaMA model specifically tailored for HPC log analysis.

Methodology

Hybrid Fine-Tuning: We develop a hybrid fine-tuning methodology that adapts a general-purpose LLM to domain-specific log data.
Privacy-Preserving: The framework is designed to be locally deployable, ensuring that sensitive information remains secure.
Efficiency: The approach is optimized for fast and energy-efficient log mining, making it suitable for real-time applications.

Experimental Validation

We conducted experiments on a diverse set of log datasets from the LogHub repository. The evaluation confirms that our approach achieves parsing accuracy on par with significantly larger models, such as LLaMA 70B and Anthropic’s Claude. The results demonstrate the effectiveness of our fine-tuned model in handling complex log data.

Practical Application

To further validate the practical utility of our fine-tuned LLM model, we parsed over 600 million production logs from the Frontier supercomputer over a four-week window. This extensive analysis uncovered critical insights into:

Temporal dynamics of log events
Node-level anomalies
Workload-error log correlations

Conclusion

The research presented in this article highlights the potential of instruction-tuned LLMs in transforming unstructured log data into actionable insights in leadership-class HPC systems. By combining advanced AI techniques with domain-specific knowledge, we pave the way for more efficient and effective log analysis, ultimately enhancing the operational capabilities of high-performance computing environments.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Instruction-Tuned LLMs for HPC Log Parsing & Mining

Instruction-Tuned LLMs for Parsing and Mining Unstructured Logs on Leadership HPC Systems

Abstract

Introduction

Proposed Framework

Methodology

Experimental Validation

Practical Application

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related