Scaling Laws for Training on Consumer GPUs Under Time Limits

Time is Not Compute: Scaling Laws for Wall-Clock Constrained Training on Consumer GPUs

Summary: arXiv:2603.28823v1 Announce Type: cross

Abstract

Scaling laws generally relate model quality to compute budget (measured in FLOPs), but practitioners in the field often encounter constraints based on wall-clock time rather than compute budgets. This article explores the optimal sizing of models under fixed time budgets that range from 5 minutes to 24 hours, specifically utilizing consumer GPUs like the RTX 4090. The research spans over 70 runs, examining model parameters ranging from 50 million to 1 billion.

Key Findings

The study reveals several critical insights regarding model training under time constraints:

U-Shaped Curve: For each time budget, a U-shaped curve is observed. This indicates that models that are too small tend to overfit, while those that are excessively large may undertrain.
Optimal Model Size: The optimal model size can be expressed as N* proportional to t^0.60, suggesting that optimal model size grows faster than the previously established Chinchilla scaling law, which indicates N* proportional to C^0.50. The exponent α is calculated to be 0.60 ± 0.07, consistently exceeding compute-optimal across all sensitivity analyses.
Dual U-Shape Mechanism: The study introduces a dual U-shape mechanism wherein short-budget U-curves are influenced by compute bottlenecks, while long-budget U-curves arise from data bottlenecks leading to overfitting. An intermediate regime is identified where the U-curve temporarily disappears, highlighting the complexity of model training dynamics.

Implications for Researchers

These findings carry significant implications for researchers who are training models using consumer hardware. The primary takeaway is that wall-clock time, rather than FLOPs, becomes the binding constraint when optimizing model performance. This shift in focus can lead to more effective training strategies that are better suited to the capabilities of consumer-grade GPUs.

Future Work

In light of these findings, further research is encouraged to explore additional parameters that may affect model training under time constraints. Understanding these dynamics could lead to the development of more robust training protocols and methodologies, ultimately advancing the field of machine learning.

Resources

To support the research community, we are releasing all code, logs, and over 70 experimental configurations used throughout this study. This transparency will enable others to replicate the findings and build upon this work, fostering collaborative advancements in the field.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Scaling Laws for Training on Consumer GPUs Under Time Limits

Time is Not Compute: Scaling Laws for Wall-Clock Constrained Training on Consumer GPUs

Abstract

Key Findings

Implications for Researchers

Future Work

Resources

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related