Meta-Harness: End-to-End Optimization of Model Harnesses
Summary: arXiv:2603.28052v1 Announce Type: new
Abstract
The performance of large language model (LLM) systems depends not only on model weights, but also on their harness: the code that determines what information to store, retrieve, and present to the model. Yet harnesses are still designed largely by hand, and existing text optimizers are poorly matched to this setting because they compress feedback too aggressively.
We introduce Meta-Harness, an outer-loop system that searches over harness code for LLM applications. It uses an agentic proposer that accesses the source code, scores, and execution traces of all prior candidates through a filesystem.
Key Findings
Our research highlights several significant findings regarding the effectiveness of Meta-Harness in various applications:
- Improved Context Management: On online text classification, Meta-Harness improves over a state-of-the-art context management system by 7.7 points while using 4x fewer context tokens.
- Enhanced Math Reasoning: In retrieval-augmented math reasoning, a single discovered harness improves accuracy on 200 IMO-level problems by 4.7 points on average across five held-out models.
- Superior Agentic Coding: In the domain of agentic coding, discovered harnesses surpass the best hand-engineered baselines on TerminalBench-2.
Introduction to Meta-Harness
Meta-Harness represents a significant advancement in the field of automated harness engineering. Traditional approaches to harness design have relied heavily on manual coding, which can introduce variability and inefficiencies. By employing an outer-loop system, Meta-Harness automates the exploration of harness code, ultimately leading to optimized performance in LLM applications.
How Meta-Harness Works
The core functionality of Meta-Harness is driven by an agentic proposer that systematically evaluates potential harness configurations. This system is capable of:
- Accessing the source code of previous harnesses
- Scoring the performance of these harnesses based on execution traces
- Utilizing a filesystem to manage and retrieve historical data for improved decision-making
Impact on Large Language Model Applications
The positive outcomes achieved through the implementation of Meta-Harness demonstrate its potential to revolutionize the development of LLM systems. By allowing for a more nuanced understanding of how harnesses interact with model weights, Meta-Harness paves the way for more efficient and effective applications in a variety of fields.
Conclusion
In conclusion, Meta-Harness has shown that richer access to prior experience can lead to significant advancements in automated harness engineering. As LLM systems continue to evolve, the importance of optimizing harnesses cannot be overstated. Meta-Harness stands as a promising solution for enhancing the capabilities of these models, ultimately contributing to their broader applications across industries.
