Auto-FP: Automated Feature Preprocessing for Tabular Data

Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data

Summary: arXiv:2310.02540v2 Announce Type: replace-cross

Abstract

Classical machine learning models, such as linear models and tree-based models, are widely used in industry. These models are sensitive to data distribution, thus feature preprocessing, which transforms features from one distribution to another, is a crucial step to ensure good model quality. Manually constructing a feature preprocessing pipeline is challenging because data scientists need to make difficult decisions about which preprocessors to select and in which order to compose them.

Introduction

In this paper, we study how to automate feature preprocessing (Auto-FP) for tabular data. Due to the large search space, a brute-force solution is prohibitively expensive. To address this challenge, we interestingly observe that Auto-FP can be modelled as either a hyperparameter optimization (HPO) or a neural architecture search (NAS) problem. This observation enables us to extend a variety of HPO and NAS algorithms to solve the Auto-FP problem.

Methodology

We conduct a comprehensive evaluation and analysis of 15 algorithms on 45 public ML datasets. Overall, evolution-based algorithms show the leading average ranking. Surprisingly, the random search turns out to be a strong baseline. Many surrogate-model-based and bandit-based search algorithms, which achieve good performance for HPO and NAS, do not outperform random search for Auto-FP.

Results

Our analysis delves into the reasons for these findings and conducts a bottleneck analysis to identify opportunities for improving these algorithms. Furthermore, we explore how to extend Auto-FP to support parameter search and compare two ways to achieve this goal.

Discussion

In the end, we evaluate Auto-FP in an AutoML context and discuss the limitations of popular AutoML tools. To the best of our knowledge, this is the first study on automated feature preprocessing. We hope our work can inspire researchers to develop new algorithms tailored for Auto-FP.

Conclusion

This study highlights the potential of automated feature preprocessing in enhancing the performance of classical machine learning models. With the insights gained from our evaluation, we aim to pave the way for future research in this vital area of machine learning.

Key Takeaways

Feature preprocessing is critical for model quality.
Auto-FP can be framed as a hyperparameter optimization or neural architecture search problem.
Evolution-based algorithms show strong performance, while random search serves as a competitive baseline.
Surrogate-model-based and bandit-based algorithms may not outperform simpler methods in this context.
The study encourages the development of tailored algorithms for automated feature preprocessing.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Auto-FP: Automated Feature Preprocessing for Tabular Data

Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data

Abstract

Introduction

Methodology

Results

Discussion

Conclusion

Key Takeaways

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related