Feature Engineering Scaling, Normalization and Standardization (original) (raw)

Last Updated : 7 Apr, 2026

Well-designed Feature engineering is the process of creating, transforming or selecting important features from raw data to improve model performance. These features help the model capture useful patterns and relationships in the data.

feature_engineering

Feature Engineering

It contributes to model building in the following ways:

Well-designed features help the model to learn complex patterns more effectively.
Removing noise and irrelevant information improves model prediction accuracy.
Focusing on meaningful features helps the model to generalize better and reduces overfitting.
Clear and informative features make the model easier to understand and interpret.

1. Absolute Maximum Scaling

Absolute Maximum Scaling is a feature scaling method where each value is divided by the maximum absolute value of that feature. This transformation rescales the data so that values fall within the range of −1 to 1.

**Sensitive to Outliers: Extreme values can affect the maximum value and reduce scaling quality.
**Best for Clean Data: Works better when the dataset does not contain strong outliers.

**Scaling Formula:

X_{\rm {scaled }}=\frac{X_{i}}{\rm{max}\left(|X|\right)}

**Implementation

Dataset can be downloaded from here.

**Step 1: Import Libraries and Dataset

Python `

import pandas as pd import numpy as np

df = pd.read_csv('Housing.csv')

df = df.select_dtypes(include=np.number) df.head()

**Output:

Screenshot-2025-08-29-163245

Dataset

**Step 2: Apply Absolute Maximum Scaling

**np.max(np.abs(df), axis=0): Calculates the maximum absolute value for each column.
**df / max_abs: Divides each value by the maximum absolute value of its column to scale the data.
**scaled_df.head(): Displays the first few rows of the scaled dataset. Python `

max_abs = np.max(np.abs(df), axis=0)

scaled_df = df / max_abs

scaled_df.head()

**Output:

Screenshot-2025-08-29-163253

Absolute Maximum Scaling

2. Min-Max Scaling

Min-Max Scaling rescales features by subtracting the minimum value and dividing by the difference between the maximum and minimum values. This usually maps feature values to the range 0 to 1 while preserving the original distribution.

**Scaling Formula:

X_{\rm {scaled }}=\frac{X_{i}-X_{\text {min}}}{X_{\rm{max}} - X_{\rm{min}}}

**Implementation

**MinMaxScaler(): Creates a scaler object for Min-Max scaling.
**scaler.fit_transform(df): Calculates min and max values and scales the dataset between 0 and 1. Python `

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler() scaled_data = scaler.fit_transform(df) scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

scaled_df.head()

**Output:

Screenshot-2025-08-29-163300

Min-Max Scaling

3. Normalization (Vector Normalization)

Normalization scales each data sample so that its vector length (Euclidean norm) becomes 1. It focuses on the direction of data points rather than their magnitude, making it useful in tasks like text classification and clustering.

Scaling Formula:

X_{\text{scaled}} = \frac{X_i}{\| X \|}

**Where:

{X_i} is each individual value.
{\| X \|} represents the Euclidean norm (or length) of the vector X.
Normalizes each sample to unit length.
Useful for direction-based similarity metrics.

**Implementation

**Normalizer(): Creates a normalizer object to scale data.
**scaler.fit_transform(df): Normalizes each row so its vector length becomes 1. Python `

from sklearn.preprocessing import Normalizer

scaler = Normalizer() scaled_data = scaler.fit_transform(df) scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

scaled_df.head()

**Output:

Screenshot-2025-08-29-163307

Normalization

4. Standardization

Standardization scales features by subtracting the mean and dividing by the standard deviation. This transforms the data so that features have zero mean and unit variance, which helps many machine learning models perform better.

Scaling Formula:

X_{\rm {scaled }}=\frac{X_{i}-\mu}{\sigma}

where \mu = mean, \sigma = standard deviation.
Produces features with mean 0 and variance 1.
Effective for data approximately normally distributed.

**Implementation

**standardScaler(): Creates a scaler for standardizing the data.
**scaler.fit_transform(df): Subtracts the mean and divides by the standard deviation. Python `

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler() scaled_data = scaler.fit_transform(df) scaled_df = pd.DataFrame(scaled_data, columns=df.columns) print(scaled_df.head())

**Output:

Screenshot-2025-08-29-163316

Standardization

5. Robust Scaling

Robust Scaling scales features using the median and interquartile range (IQR) instead of the mean and standard deviation. This makes it less sensitive to outliers and skewed data, making it suitable for datasets with extreme values or noise.

Scaling Formula:

X_{\rm {scaled }}=\frac{X_{i}-X_{\text {median }}}{IQR}

**Implementation

**RobustScaler(): Creates a scaler that uses median and IQR for scaling.
**scaler.fit_transform(df): Scales the data while reducing the influence of outliers. Python `

from sklearn.preprocessing import RobustScaler

scaler = RobustScaler() scaled_data = scaler.fit_transform(df) scaled_df = pd.DataFrame(scaled_data, columns=df.columns) print(scaled_df.head())

**Output:

Screenshot-2025-08-29-163327

Robust Scaling

Comparison of Various Feature Scaling Techniques

Let's see the key differences across the five main feature scaling techniques commonly used in machine learning preprocessing.

Type	Method Description	Sensitivity to Outliers	Typical Use Cases
Absolute Maximum Scaling	Divides values by max absolute value in each feature	High	Sparse data, simple scaling
Min-Max Scaling (Normalization)	Scales features to by min-max normalization	High	Neural networks, bounded input features
Normalization (Vector Norm)	Scales each sample vector to unit length (norm = 1)	Not applicable (per row)	Direction-based similarity, text classification
Standardization (Z-Score)	Centers features to mean 0 and scales to unit variance	Moderate	Most ML algorithms, assumes approx. normal data
Robust Scaling	Centers on median and scales using IQR	Low	Data with outliers, skewed distributions

Advantages

**Improves Model Performance: Enhances accuracy and predictive power by presenting features in comparable scales.
**Speeds Up Convergence: Helps gradient-based algorithms train faster and more reliably.
**Prevents Feature Bias: Avoids dominance of large-scale features, ensuring fair contribution from all features.
**Increases Numerical Stability: Reduces risks of overflow/underflow in computations.
**Facilitates Algorithm Compatibility: Makes data suitable for distance- and gradient-based models like SVM, KNN and neural networks.