Greedy Is a Strong Default: Agents as Iterative Optimizers
Summary: arXiv:2603.27415v1 Announce Type: new
Abstract: Classical optimization algorithms–hill climbing, simulated annealing, population-based methods–generate candidate solutions via random perturbations. We replace the random proposal generator with an LLM agent that reasons about evaluation diagnostics to propose informed candidates, and ask: does the classical optimization machinery still help when the proposer is no longer random?
Introduction
In recent years, the application of Large Language Models (LLMs) to various optimization tasks has garnered significant attention. This research delves into the potential of LLMs as effective agents in optimization processes, particularly when they replace traditional random proposal generators.
Research Objectives
The primary objective of this study is to evaluate the efficacy of LLMs in classical optimization frameworks. The researchers investigate whether the use of LLMs enhances the quality of proposed solutions compared to traditional methods that rely on random perturbations.
Methodology
The evaluation spans four distinct tasks, covering various search spaces, including discrete, mixed, and continuous. Each task is replicated across three independent runs to ensure robustness in the findings. The tasks include:
- Rule-based classification on Breast Cancer
- Mixed hyperparameter optimization for MobileNetV3-Small on STL-10
- LoRA fine-tuning of Qwen2.5-0.5B on SST-2
- XGBoost on Adult Census
Results
The results from the study highlight significant improvements in optimization outcomes when LLMs are employed. The findings across the tasks include:
- Breast Cancer classification achieved a test accuracy ranging from 86.0% to 96.5%.
- MobileNetV3-Small optimization yielded accuracies between 84.5% and 85.8%, with zero catastrophic failures compared to 60% for random search.
- LoRA fine-tuning on SST-2 improved accuracy from 89.5% to 92.7%, matching Optuna TPE with double the efficiency.
- XGBoost on Adult Census demonstrated an AUC increase from 0.9297 to 0.9317, tying CMA-ES while requiring three times fewer evaluations.
Discussion
In a cross-task ablation analysis, it was found that traditional optimization techniques, including simulated annealing and parallel investigators, provided no additional benefit over greedy hill climbing. Interestingly, even the use of a second LLM model, such as OpenAI Codex, required 2-3 times more evaluations without improving outcomes.
This suggests that the LLM’s learned prior is sufficiently robust, indicating that the complexity of acceptance-rule sophistication has limited impact on the outcomes. Notably, the first round of proposals delivered the majority of the improvements, with various strategies converging to similar configurations.
Conclusion
The findings from this research imply that a simple approach—greedy hill climbing with early stopping—serves as an effective default optimization strategy. Beyond enhancing accuracy, the framework also produces human-interpretable artifacts, such as cancer classification rules that align with established cytopathology principles. This research not only reinforces the potential of LLMs in optimization tasks but also highlights the advantages of simplicity in algorithm design.
