Hubble: An LLM-Driven Agentic Framework for Safe and Automated Alpha Factor Discovery
Summary: arXiv:2604.09601v1 Announce Type: new
Abstract
Discovering predictive alpha factors in quantitative finance remains a formidable challenge due to the vast combinatorial search space and inherently low signal-to-noise ratios in financial data. Existing automated methods, particularly genetic programming, often produce complex, uninterpretable formulas prone to overfitting. We introduce Hubble, a closed-loop factor mining framework that leverages Large Language Models (LLMs) as intelligent search heuristics, constrained by a domain-specific operator language and an Abstract Syntax Tree (AST)-based execution sandbox.
Framework Overview
The Hubble framework evaluates candidate factors through a rigorous statistical pipeline encompassing cross-sectional Rank Information Coefficient (RankIC), annualized Information Ratio, and portfolio turnover. This approach ensures that only the most promising factors are considered for further analysis.
Methodology
A significant aspect of Hubble is its evolutionary feedback mechanism, which returns top-performing factors and structured error diagnostics to the LLM. This feature enables iterative refinement across multiple generation rounds, thereby enhancing the quality of the factors discovered.
Experimental Results
In experiments conducted on a panel of 30 U.S. equities over 752 trading days, the system evaluated 181 syntactically valid factors from 122 unique candidates across three rounds. The findings showed a peak composite score of 0.827 with 100% computational stability.
Conclusion
Our results demonstrate that combining LLM-driven generation with deterministic safety constraints yields an effective, interpretable, and reproducible approach to automated factor discovery. This advancement not only enhances the efficiency of factor discovery in quantitative finance but also addresses the interpretability issues that have plagued earlier methods.
Key Features of Hubble
- LLM Integration: Utilizes Large Language Models for intelligent search heuristics.
- Domain-Specific Language: Constrained by an operator language tailored for financial data.
- AST-Based Sandbox: Employs an Abstract Syntax Tree for secure execution of candidate factors.
- Statistical Rigor: Incorporates a comprehensive statistical evaluation process.
- Iterative Refinement: Features an evolutionary feedback loop for continuous improvement.
Future Directions
As the financial landscape evolves, the Hubble framework can adapt to include additional variables and factors, expanding its applicability across various market conditions. Future research will focus on enhancing the algorithm’s robustness and exploring its potential in different asset classes beyond equities.
