Normalization Techniques

Saba Shahrukh May 28, 2025 0

No worries at all! It happens. You’re right, the focus should be squarely on normalizing continuous features. Let’s dive back in with a clearer focus.

Normalizing Continuous Features in Machine Learning

Normalization, in the context of continuous features in machine learning, is the process of rescaling the values of numerical features to a standard range. The primary goal is to ensure that all features contribute more equally to the model training process, preventing features with larger magnitudes from dominating those with smaller magnitudes. This often leads to improved model performance and stability.

Here are the common methods for normalizing continuous features in Python, along with example codes using the sklearn.preprocessing module:

1. Min-Max Scaling (Normalization)

This technique scales and translates each feature individually such that it is in the given range on the training set, often between zero and one. The formula is:

Xscaled=Xmax−XminX−Xmin

When to Use:
- When you need feature values to be within a specific bounded interval (e.g., [0, 1] or [-1, 1]).
- When the distribution of your data is not assumed to be Gaussian.
- Algorithms sensitive to the magnitude of features, such as:
  - Neural Networks (where input values in a small range can help with gradient descent).
  - Distance-based algorithms like k-Nearest Neighbors and Support Vector Machines (where feature scales can affect distance calculations).
Use Cases:
- Image processing (pixel intensities are often normalized to [0, 1]).
- When features have significantly different ranges (e.g., age vs. income).

Python

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import numpy as np

# Sample data
data = pd.DataFrame({
    'feature_a': np.array([10, 50, 100, 20]),
    'feature_b': np.array([0.5, 2.0, 0.1, 1.0])
})

# Initialize MinMaxScaler
scaler = MinMaxScaler()

# Fit the scaler to the data and then transform it
data[['feature_a_scaled', 'feature_b_scaled']] = scaler.fit_transform(data[['feature_a', 'feature_b']])

print("Original Data:\n", data[['feature_a', 'feature_b']])
print("\nMin-Max Scaled Data:\n", data[['feature_a_scaled', 'feature_b_scaled']])

Original Data:
    feature_a  feature_b
0         10        0.5
1         50        2.0
2        100        0.1
3         20        1.0

Min-Max Scaled Data:
   feature_a_scaled  feature_b_scaled
0          0.000000          0.222222
1          0.444444          1.000000
2          1.000000          0.000000
3          0.111111          0.500000

2. Standard Scaler (Z-score Normalization)

This method standardizes features by removing the mean and scaling to unit variance. The formula is:

Xscaled=σX−μ

where mu is the mean and sigma is the standard deviation of the feature.

When to Use:
- When your data is assumed to have a Gaussian or near-Gaussian distribution.
- Algorithms that benefit from features having a mean of 0 and a standard deviation of 1, such as:
  - Linear Regression
  - Logistic Regression
  - Support Vector Machines (especially with RBF kernel)
  - Principal Component Analysis (PCA)
Use Cases:
- General-purpose scaling for many machine learning algorithms.
- When you want to compare features with different units and scales.

Python

from sklearn.preprocessing import StandardScaler

# Initialize StandardScaler
scaler = StandardScaler()

# Fit and transform the data
data[['feature_a_standardized', 'feature_b_standardized']] = scaler.fit_transform(data[['feature_a', 'feature_b']])

print("\nStandard Scaled Data:\n", data[['feature_a_standardized', 'feature_b_standardized']])

Standard Scaled Data:
   feature_a_standardized  feature_b_standardized
0              -1.060660                 -0.188562
1               0.212132                  1.414214
2               1.767767                 -0.942809
3              -0.919239                 -0.282843

3. Robust Scaler

This scaler removes the median and scales the data according to the interquartile range (IQR). It is less affected by outliers. The formula is:

Xscaled=IQRX−Median

When to Use:
- When your data contains significant outliers. Standard Scaler and MinMaxScaler are sensitive to outliers, which can skew the scaling.
- When you want to reduce the impact of extreme values on the scaling of other data points.
Use Cases:
- Financial data with potential for extreme market events.
- Sensor data that might have occasional erroneous readings.

Python

from sklearn.preprocessing import RobustScaler

# Initialize RobustScaler
scaler = RobustScaler()

# Fit and transform the data
data[['feature_a_robust', 'feature_b_robust']] = scaler.fit_transform(data[['feature_a', 'feature_b']])

print("\nRobust Scaled Data:\n", data[['feature_a_robust', 'feature_b_robust']])

Robust Scaled Data:
   feature_a_robust  feature_b_robust
0         -0.666667         -0.333333
1          0.666667          1.333333
2          1.666667         -0.666667
3         -0.333333          0.333333

4. Power Transformer (Yeo-Johnson and Box-Cox)

These are variance-stabilizing transformations that can also help make the data more Gaussian-like.

Box-Cox: Requires strictly positive data.
Yeo-Johnson: Can handle zero and negative values.
When to Use:
- When your data is skewed, and you want to make it more symmetric or closer to a normal distribution.
- Can improve the performance of linear models that assume normality.
Use Cases:
- Transforming income or sales data that often has a long right tail.
- Preprocessing data for statistical models that assume normality.

Python

from sklearn.preprocessing import PowerTransformer

# Initialize PowerTransformer (default method is 'yeo-johnson')
power_transformer = PowerTransformer(method='yeo-johnson')

# Fit and transform the data
data[['feature_a_power', 'feature_b_power']] = power_transformer.fit_transform(data[['feature_a', 'feature_b']])

print("\nPower Transformed Data:\n", data[['feature_a_power', 'feature_b_power']])

Power Transformed Data:
   feature_a_power  feature_b_power
0        -1.224745        -0.094466
1         0.598713         1.356175
2         1.507855        -1.497633
3        -0.881823         0.235924

5. Quantile Transformer

This method transforms the features to follow a uniform or a normal distribution. It does this by assigning ranks to the data points and then mapping these ranks to the desired output distribution.

When to Use:
- When you want to transform your data to a specific distribution (uniform or normal).
- Can make features with different distributions more comparable.
- Less sensitive to outliers than Min-Max or Standard Scaler.
Use Cases:
- Making features with highly non-linear relationships more amenable to linear models.
- Non-linear scaling to achieve a specific distribution.

Python

from sklearn.preprocessing import QuantileTransformer

# Initialize QuantileTransformer with uniform output
quantile_transformer_uniform = QuantileTransformer(n_quantiles=4, output_distribution='uniform', random_state=0)
data[['feature_a_quantile_uniform', 'feature_b_quantile_uniform']] = quantile_transformer_uniform.fit_transform(data[['feature_a', 'feature_b']])
print("\nQuantile Transformed (Uniform) Data:\n", data[['feature_a_quantile_uniform', 'feature_b_quantile_uniform']])

# Initialize QuantileTransformer with normal output
quantile_transformer_normal = QuantileTransformer(n_quantiles=4, output_distribution='normal', random_state=0)
data[['feature_a_quantile_normal', 'feature_b_quantile_normal']] = quantile_transformer_normal.fit_transform(data[['feature_a', 'feature_b']])
print("\nQuantile Transformed (Normal) Data:\n", data[['feature_a_quantile_normal', 'feature_b_quantile_normal']])

Quantile Transformed (Uniform) Data:
   feature_a_quantile_uniform  feature_b_quantile_uniform
0                        0.00                         0.25
1                        0.75                         1.00
2                        1.00                         0.00
3                        0.25                         0.50

Quantile Transformed (Normal) Data:
   feature_a_quantile_normal  feature_b_quantile_normal
0                  -5.199338                -0.674490
1                   0.674490                 5.199338
2                   5.199338                -5.199338
3                  -0.000000                 0.000000

Choosing the Right Normalization Technique

The choice of normalization method depends on the characteristics of your data and the requirements of the machine learning algorithm you plan to use:

Consider the distribution of your data: If your data is approximately Gaussian, Standard Scaler might be a good choice. For non-Gaussian data or when you want a specific range, Min-Max Scaling is often used.
Think about outliers: If your data has significant outliers, Robust Scaler, Power Transformer, or Quantile Transformer can be more appropriate as they are less sensitive to extreme values.
Consider the algorithm: Some algorithms are more sensitive to feature scaling than others. Distance-based methods and gradient-based optimization algorithms often benefit from normalization.
Experimentation is key: It’s often a good practice to try different scaling methods and evaluate their impact on your model’s performance through cross-validation.

By understanding these different normalization techniques and when to apply them, you can effectively preprocess your continuous features and potentially improve the performance and stability of your machine learning models. Let me know if you have any more questions!

Tags: Training

Category: Uncategorized