Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures
Summary: arXiv:2604.09048v1 Announce Type: cross
Abstract: While the large energy consumption of Large Language Models (LLMs) is recognized by the community, system operators lack guidance for energy-efficient LLM inference deployments that leverage energy trade-offs of heterogeneous hardware due to a lack of energy-aware benchmarks and data. In this work we address this gap with Watt Counts: the largest open-access dataset of energy consumption of LLMs, with over 5,000 experiments for 50 LLMs across 10 NVIDIA Graphics Processing Units (GPUs) in batch and server scenarios along with a reproducible, open-source benchmark that enables community submissions to expand this dataset. Leveraging this dataset, we conduct a system-level study of LLM inference across heterogeneous GPU architectures and show that GPU selection is crucial for energy efficiency outcomes and that optimal hardware choices vary significantly across models and deployment scenarios, demonstrating the critical importance of hardware-aware deployment in heterogeneous LLM systems. Guided by our data and insights, we show that practitioners can reduce energy consumption by up to 70% in server scenarios with negligible impact on user experience, and by up to 20% in batch scenarios.
Introduction
Large Language Models (LLMs) have revolutionized various fields by providing advanced natural language processing capabilities. However, their substantial energy requirements raise concerns about sustainability and operational costs. To tackle these challenges, the introduction of Watt Counts provides a comprehensive solution for energy-aware LLM inference.
Overview of Watt Counts
Watt Counts is designed to fill the existing gap in energy benchmarks for LLMs. The initiative offers:
- A vast open-access dataset containing over 5,000 experiments.
- Information on 50 different LLMs tested across 10 diverse NVIDIA GPU models.
- Data collected from both batch and server scenarios to reflect real-world applications.
- A reproducible, open-source benchmark encouraging community participation.
Significance of the Dataset
The Watt Counts dataset stands out as the largest of its kind, crucial for understanding the energy profiles of various LLMs. By analyzing the data, researchers and practitioners can make informed decisions regarding GPU selection for energy efficiency. Key insights include:
- Optimal GPU choices vary significantly across different LLMs and deployment scenarios.
- Energy efficiency can be critically improved through careful hardware selection.
Impact on Energy Consumption
One of the most significant findings from the Watt Counts initiative is the potential reduction in energy consumption. The study reveals:
- In server scenarios, practitioners can achieve energy savings of up to 70% with minimal impact on user experience.
- In batch scenarios, energy consumption can be reduced by up to 20%.
Such findings underscore the potential for sustainable practices in deploying LLMs without compromising performance.
Conclusion
The Watt Counts project represents a pivotal step towards more sustainable AI practices in LLM inference. By providing an accessible dataset and benchmark, it empowers system operators to make informed decisions that balance performance and energy efficiency. As the demand for LLMs continues to grow, initiatives like Watt Counts will be essential in guiding the community towards sustainable AI solutions.
