Algorithm Selection with Zero Domain Knowledge via Text Embeddings
Summary: arXiv:2604.19753v1 Announce Type: new
Abstract
In a groundbreaking study, researchers have introduced a feature-free approach to algorithm selection that eliminates the need for hand-crafted instance features by leveraging pretrained text embeddings. This innovative method, dubbed ZeroFolio, operates through a three-step process: reading raw instance files in plain text, embedding these instances using a pretrained embedding model, and selecting an algorithm through weighted k-nearest neighbors.
Key Insights
The cornerstone of the ZeroFolio methodology is the realization that pretrained embeddings generate representations capable of distinguishing problem instances without requiring any domain knowledge or specific training for the task at hand. This characteristic enables the application of the same three-step pipeline—serialize, embed, select—across a wide array of problem domains characterized by text-based instance formats.
Evaluation and Results
The research team evaluated their approach across 11 ASlib scenarios across 7 distinct domains, which include:
- Satisfiability (SAT)
- Maximum Satisfiability (MaxSAT)
- Quantified Boolean Formulas (QBF)
- Answer Set Programming (ASP)
- Constraint Satisfaction Problems (CSP)
- Mixed Integer Programming (MIP)
- Graph Problems
The results of the experiments revealed that ZeroFolio significantly outperforms a random forest model that is trained on hand-crafted features in 10 out of the 11 scenarios tested using a single fixed configuration. Moreover, when employing a two-seed voting approach, the method excelled in all 11 scenarios, often achieving substantial margins of victory.
Design Choices and Improvements
An ablation study conducted by the researchers identified several critical design choices that contribute to the success of the ZeroFolio approach, including:
- Inverse-distance weighting
- Line shuffling
- Manhattan distance
Additionally, in scenarios where both the ZeroFolio method and traditional selectors were found to be competitive, the integration of embeddings with hand-crafted features via soft voting results in further enhancements in performance.
Conclusion
The introduction of ZeroFolio represents a significant advancement in the field of algorithm selection, demonstrating that it is possible to achieve high-performance results without the burden of domain-specific knowledge or intricate feature engineering. This paradigm shift not only streamlines the algorithm selection process but also opens up new avenues for research and application across various problem domains.
