Skip to Content

Feature Scaling (Normalization & Standardization)

Start writing here...

Absolutely! Feature scaling is a crucial preprocessing step in machine learning, especially for models that rely on distance or gradient-based optimization (like k-NN, SVMs, logistic regression, and neural networks).

⚖️ What is Feature Scaling?

Feature scaling transforms your data so that features have similar ranges or distributions. This ensures no single feature dominates just because of its scale.

📊 Two Main Methods:

🔹 1. Normalization (Min-Max Scaling)

  • Formula: xscaled=x−xminxmax−xminx_{\text{scaled}} = \frac{x - x_{\text{min}}}{x_{\text{max}} - x_{\text{min}}}
  • Range: Scales data to [0, 1] (or another range you choose).
  • Use When: You know the data has a bounded range or when using methods like neural networks.

✅ Example:

If a feature has values from 20 to 80, a value of 50 becomes:

50−2080−20=3060=0.5\frac{50 - 20}{80 - 20} = \frac{30}{60} = 0.5

🔸 2. Standardization (Z-score Normalization)

  • Formula: xscaled=x−μσx_{\text{scaled}} = \frac{x - \mu}{\sigma}
  • Range: Centers data around 0 with standard deviation 1.
  • Use When: Data may contain outliers or doesn’t have a known range; used with PCA, logistic regression, etc.

✅ Example:

If a feature has a mean of 100 and standard deviation of 20, a value of 120 becomes:

120−10020=1\frac{120 - 100}{20} = 1

🧠 When to Scale:

  • Algorithms that use distance (k-NN, SVM)
  • Algorithms that rely on gradient descent (logistic regression, neural nets)
  • Principal Component Analysis (PCA)

⚠️ When Not Necessary:

  • Tree-based models (e.g., decision trees, random forests, XGBoost) generally don’t require scaling.

Would you like a side-by-side visual comparing normalization and standardization effects?