Auto-FP: Automated Feature Preprocessing for Tabular Data

Date:

Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data

Summary: arXiv:2310.02540v2 Announce Type: replace-cross

Abstract

Classical machine learning models, such as linear models and tree-based models, are widely used in industry. These models are sensitive to data distribution, thus feature preprocessing, which transforms features from one distribution to another, is a crucial step to ensure good model quality. Manually constructing a feature preprocessing pipeline is challenging because data scientists need to make difficult decisions about which preprocessors to select and in which order to compose them.

Introduction

In this paper, we study how to automate feature preprocessing (Auto-FP) for tabular data. Due to the large search space, a brute-force solution is prohibitively expensive. To address this challenge, we interestingly observe that Auto-FP can be modelled as either a hyperparameter optimization (HPO) or a neural architecture search (NAS) problem. This observation enables us to extend a variety of HPO and NAS algorithms to solve the Auto-FP problem.

Methodology

We conduct a comprehensive evaluation and analysis of 15 algorithms on 45 public ML datasets. Overall, evolution-based algorithms show the leading average ranking. Surprisingly, the random search turns out to be a strong baseline. Many surrogate-model-based and bandit-based search algorithms, which achieve good performance for HPO and NAS, do not outperform random search for Auto-FP.

Results

Our analysis delves into the reasons for these findings and conducts a bottleneck analysis to identify opportunities for improving these algorithms. Furthermore, we explore how to extend Auto-FP to support parameter search and compare two ways to achieve this goal.

Discussion

In the end, we evaluate Auto-FP in an AutoML context and discuss the limitations of popular AutoML tools. To the best of our knowledge, this is the first study on automated feature preprocessing. We hope our work can inspire researchers to develop new algorithms tailored for Auto-FP.

Conclusion

This study highlights the potential of automated feature preprocessing in enhancing the performance of classical machine learning models. With the insights gained from our evaluation, we aim to pave the way for future research in this vital area of machine learning.

Key Takeaways

  • Feature preprocessing is critical for model quality.
  • Auto-FP can be framed as a hyperparameter optimization or neural architecture search problem.
  • Evolution-based algorithms show strong performance, while random search serves as a competitive baseline.
  • Surrogate-model-based and bandit-based algorithms may not outperform simpler methods in this context.
  • The study encourages the development of tailored algorithms for automated feature preprocessing.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.