- Machine Learning for Developers
- Rodolfo Bonnin
- 206字
- 2021-07-02 15:46:52
Normalization or standardization
This technique aims to give the dataset the properties of a normal distribution, that is, a mean of 0 and a standard deviation of 1.
The way to obtain these properties is by calculating the so-called z scores, based on the dataset samples, with the following formula:

Let's visualize and practice this new concept with the help of scikit-learn, reading a file from the MPG dataset, which contains city-cycle fuel consumption in miles per gallon, based on the following features: mpg, cylinders, displacement, horsepower, weight, acceleration, model year, origin, and car name.
from sklearn import preprocessing
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_csv("data/mpg.csv")
plt.figure(figsize=(10,8))
print df.columns
partialcolumns = df[['acceleration', 'mpg']]
std_scale = preprocessing.StandardScaler().fit(partialcolumns)
df_std = std_scale.transform(partialcolumns)
plt.scatter(partialcolumns['acceleration'], partialcolumns['mpg'], color="grey", marker='^')
plt.scatter(df_std[:,0], df_std[:,1])
The following picture allows us to compare the non normalized and normalized data representations:

Depiction of the original dataset, and its normalized counterpart.
It's very important to have an account of the denormalizing of the resulting data at the time of evaluation so that you do not lose the representative of the data, especially if the model is applied to regression, when the regressed data won't be useful if not scaled.
推薦閱讀
- HTML5+CSS3王者歸來
- Apache Oozie Essentials
- 垃圾回收的算法與實現
- Reactive Programming with Swift
- FFmpeg入門詳解:音視頻流媒體播放器原理及應用
- Python Data Analysis(Second Edition)
- 利用Python進行數據分析(原書第3版)
- Integrating Facebook iOS SDK with Your Application
- RSpec Essentials
- SQL Server數據庫管理與開發兵書
- Practical Predictive Analytics
- JavaScript悟道
- Learning Unity Physics
- Learning Android Application Development
- Xamarin Mobile Application Development for Android(Second Edition)