官术网_书友最值得收藏!

Scaling

Values of different features can differ by orders of magnitude. Sometimes, this may mean that the larger values dominate the smaller values. This depends on the algorithm we're using. For certain algorithms to work properly, we're required to scale the data.

There are following several common strategies that we can apply:

  • Standardization removes the mean of a feature and divides by the standard deviation. If the feature values are normally distributed, we'll get a Gaussian, which is centered around zero with a variance of one.
  • If the feature values aren't normally distributed, we can remove the median and divide by the interquartile range. The interquartile range is a range between the first and third quartile (or 25th and 75th percentile).
  • Scaling features to a range is a common choice of range between zero and one.
主站蜘蛛池模板: 英吉沙县| 临沂市| 七台河市| 平塘县| 罗定市| 金阳县| 玉林市| 安陆市| 桃源县| 宁国市| 紫阳县| 荣成市| 上饶县| 通城县| 泸定县| 潼关县| 布尔津县| 金乡县| 准格尔旗| 余庆县| 黄大仙区| 共和县| 水富县| 闻喜县| 健康| 莱西市| 鱼台县| 迭部县| 徐闻县| 德州市| 个旧市| 塔城市| 大庆市| 阳东县| 阿拉善左旗| 平罗县| 肥乡县| 梅河口市| 翼城县| 巴马| 永吉县|