Top 5 Python Scripts for Feature Selection in ML

Date:

5 Useful Python Scripts for Effective Feature Selection

In the realm of data science, effective feature selection is crucial for building predictive models that not only perform well but are also interpretable. This article will introduce five simple yet powerful Python scripts that can assist data scientists and machine learning practitioners in selecting the most relevant features for their projects. Each script is designed to be practical and easy to implement, making them suitable for real-world applications.

1. Recursive Feature Elimination (RFE)

Recursive Feature Elimination is a feature selection method that recursively removes the least important features based on a specified model. Here’s a simple implementation using scikit-learn:

from sklearn.datasets import load_iris
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

iris = load_iris()
X, y = iris.data, iris.target
model = LogisticRegression()
rfe = RFE(model, 3)
fit = rfe.fit(X, y)

print("Selected Features: %s" % fit.support_)
print("Feature Ranking: %s" % fit.ranking_)

2. Lasso Regularization

Lasso regression adds a penalty equivalent to the absolute value of the magnitude of coefficients to the loss function, effectively performing feature selection. The following script demonstrates how to use Lasso for this purpose:

from sklearn.linear_model import Lasso
import numpy as np

X = np.random.rand(100, 10)
y = np.random.rand(100)

lasso = Lasso(alpha=0.1)
lasso.fit(X, y)

print("Coefficients: %s" % lasso.coef_)
print("Selected Features: %s" % np.where(lasso.coef_ != 0)[0])

3. Feature Importance from Tree-based Models

Tree-based models like Random Forests can provide feature importance scores, which can be used to select the most relevant features. Below is an example using the Random Forest model:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target
model = RandomForestClassifier()
model.fit(X, y)

importances = model.feature_importances_
indices = np.argsort(importances)[::-1]

print("Feature ranking:")
for f in range(X.shape[1]):
    print("%d. Feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))

4. Univariate Feature Selection

This method selects features based on univariate statistical tests. The following script demonstrates how to implement this using SelectKBest:

from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest, chi2

iris = load_iris()
X, y = iris.data, iris.target
selector = SelectKBest(score_func=chi2, k=3)
X_new = selector.fit_transform(X, y)

print("Selected Features: %s" % selector.get_support(indices=True))

5. Correlation Matrix

A correlation matrix can help identify features that are highly correlated with the target variable. Below is an example of how to visualize and select features using Pandas:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data = pd.DataFrame(X)
data['target'] = y
corr = data.corr()

plt.figure(figsize=(10, 8))
sns.heatmap(corr, annot=True)
plt.show()

Conclusion

Feature selection is a pivotal step in the data preprocessing phase of machine learning. The five Python scripts presented in this article provide various methods to select relevant features effectively. By implementing these techniques, data scientists can enhance model performance and interpretability, paving the way for better insights and decision-making.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.