Feature Selection | Filter Methods (original) (raw)

Last Updated : 23 Jul, 2025

Feature selection is a important step in the machine learning pipeline. By identifying and retaining only the most relevant features, we can build models that generalize better, train faster, and are easier to interpret. Among the various approaches, filter methods are popular due to their simplicity, speed, and independence from specific machine learning models.

What is Feature Selection?

Feature selection is the process of selecting a subset of relevant features (predictor variables) from a larger set. Unlike feature extraction, which creates new features from combinations or transformations of original ones, feature selection retains the original variables.

Let: \mathbf{X} \in \mathbb{R}^{n \times d} \quad \text{(feature matrix with } n \text{ samples and } d \text{ features)}

\mathbf{y} \in \mathbb{R}^n \quad \text{(target variable)}

Our goal is to select a subset S \subseteq \{1, 2, ..., d\} such that features in S are most predictive of \mathbf{y}.

Feature Selection: Where Do Filter Methods Fit?

Method Model Involvement Speed Captures Feature Interaction
Filter No Fast No
Wrapper Yes Slow Yes
Embedded Yes Moderate Yes

Filter methods are typically used **early in the pipeline, especially during **exploratory data analysis (EDA) or **as a first-pass reduction technique before applying more sophisticated methods.

Filter Methods

Filter methods evaluate the relevance of features by examining their intrinsic properties β€” independently of any predictive model. This makes them highly scalable and general-purpose.

Key Characteristics

Common Filter Techniques

1. Variance Thresholding

Features with low variance across samples contain less information and can be removed.

\text{Var}(X_j) = \frac{1}{n} \sum_{i=1}^n (x_{ij} - \bar{x}_j)^2

Remove X_j if \text{Var}(X_j) < \theta.

Python `

from sklearn.feature_selection import VarianceThreshold

selector = VarianceThreshold(threshold=0.01) X_selected = selector.fit_transform(X)

`

2. Correlation Coefficient

Measures linear relationship between a feature and the target. For continuous 𝑦, use Pearson correlation:

\rho_{X_j, y} = \frac{\text{Cov}(X_j, y)}{\sigma_{X_j} \sigma_y}

Drop features with low absolute correlation |\rho| < \theta.

Python `

import numpy as np import pandas as pd

correlations = df.corr()['target'].drop('target') selected = correlations[correlations.abs() > 0.1].index X_selected = df[selected]

`

3. Chi-Squared Test (χ²)

For categorical targets and categorical features, the Chi-squared test assesses dependence:

\chi^2 = \sum \frac{(O - E)^2}{E}

Where O = observed frequency, E = expected frequency.

Python `

from sklearn.feature_selection import SelectKBest, chi2

X_new = SelectKBest(score_func=chi2, k=10).fit_transform(X, y)

`

4. Mutual Information (MI)

Measures non-linear dependencies between variables. Mutual information between feature 𝑋𝑗 and target 𝑦 is:

I(X_j; y) = \sum_{x_j} \sum_y p(x_j, y) \log \left( \frac{p(x_j, y)}{p(x_j)p(y)} \right)

Python `

from sklearn.feature_selection import mutual_info_classif

mi_scores = mutual_info_classif(X, y) mi_selected = X[:, mi_scores > 0.1]

`

5. F-test (ANOVA)

Used for classification problems with continuous features and categorical targets. It measures if means across groups are significantly different.

F = \frac{\text{Between-group variability}}{\text{Within-group variability}}

Python `

from sklearn.feature_selection import f_classif

F_values, p_values = f_classif(X, y) X_selected = X[:, p_values < 0.05]

`

Comparison Table

Method Target Type Feature Type Captures Non-linear Model-agnostic
Variance Threshold Any Any No Yes
Correlation Continuous Continuous No Yes
Chi-Squared Categorical Categorical No Yes
Mutual Information Any Any Yes Yes
ANOVA F-test Categorical Continuous No Yes

Implementation of Filter Methods

Step-by-Step

  1. **Preprocess Data: Handle missing values, encode categorical variables.
  2. **Normalize/Standardize (when needed): Especially before correlation or variance-based filtering.
  3. **Apply Filter Criteria: Use thresholding based on chosen metrics.
  4. **Evaluate Reduced Feature Set: Optionally use wrapper or embedded methods afterward.
  5. **Train Model: Proceed with model training using the filtered feature set. Python `

from sklearn.feature_selection import VarianceThreshold, SelectKBest, f_classif from sklearn.preprocessing import StandardScaler from sklearn.datasets import load_iris

Load sample data

iris = load_iris() X = iris.data y = iris.target

Standardize features

scaler = StandardScaler() X_sca = scaler.fit_transform(X)

Remove low variance features

selector_var = VarianceThreshold(threshold=0.1) X_var = selector_var.fit_transform(X_sca)

ANOVA F-test for classification

sele_f = SelectKBest(score_func=f_classif, k=10) X_sele = sele_f.fit_transform(X_var, y)

`

Advantages of Filter Methods

Limitations of Filter Methods

When to Use Filter Methods?

Use filter methods: