官术网_书友最值得收藏!

Feature scaling

A very important engineering technique that is necessary to perform even with neural networks is feature scaling. It's necessary to scale the numerical input to have all the features on the same scale; otherwise, the network will give more importance to features with larger numerical values.

A very simple transformation is re-scaling the input between 0 and 1, also known as MinMax scaling. Other common operations are standardization and zero-mean translation, which makes sure the standard deviation of the input is 1 and the mean is 0, which in the scikit-learn library are implemented in the scale method:

from sklearn import preprocessing
import numpy as np
X_train = np.array([[ -3., 1., 2.],
[ 2., 0., 0.],
[ 1., 2., 3.]])
X_scaled = preprocessing.scale(X_train)

The preceding command generates the following result:

Out[2]:
array([[-1.38873015, 0. , 0.26726124],
[ 0.9258201 , -1.22474487, -1.33630621],
[ 0.46291005, 1.22474487, 1.06904497]])

You can find many other numerical transformations already available in scikit-learn. Some other important transformations from its documentation are as follows:

  • PowerTransformer: This transformation applies a power transformation to each feature in order to transform the data to follow a Gaussian-like distribution. It will find the optimal scaling factor to stabilize the variance and at the same time minimize skewness. The PowerTransformer transformation of scikit-learn will force the mean to be zero and force the variance to 1.
  • QuantileTransformerThis transformation has an additional output_distribution parameter that allows us to force a Gaussian distribution to the features instead of a uniform distribution. It will introduce saturation for our inputs' extreme values.
主站蜘蛛池模板: 潞城市| 连南| 彭山县| 江永县| 永安市| 博湖县| 万安县| 花垣县| 德保县| 紫金县| 丰城市| 石棉县| 庐江县| 嘉鱼县| 满城县| 邵阳县| 额济纳旗| 昌宁县| 巨野县| 承德县| 彩票| 南城县| 乐陵市| 玉门市| 利川市| 宁阳县| 咸丰县| 新巴尔虎左旗| 尚义县| 九龙城区| 平谷区| 饶河县| 呼和浩特市| 衡水市| 通海县| 清丰县| 岱山县| 江达县| 南京市| 闽侯县| 鄂温|