官术网_书友最值得收藏!

Speeding up the training process using batch normalization

In the previous section on the scaling dataset, we learned that optimization is slow when the input data is not scaled (that is, it is not between zero and one). 

The hidden layer value could be high in the following scenarios:

  • Input data values are high
  • Weight values are high
  • The multiplication of weight and input are high

Any of these scenarios can result in a large output value on the hidden layer.

Note that the hidden layer is the input layer to output layer. Hence, the phenomenon of high input values resulting in a slow optimization holds true when hidden layer values are large as well.

Batch normalization comes to the rescue in this scenario. We have already learned that, when input values are high, we perform scaling to reduce the input values. Additionally, we have learned that scaling can also be performed using a different method, which is to subtract the mean of the input and divide it by the standard deviation of the input. Batch normalization performs this method of scaling.

Typically, all values are scaled using the following formula:

Notice that γ and β are learned during training, along with the original parameters of the network.

主站蜘蛛池模板: 达日县| 岳池县| 铜山县| 额敏县| 游戏| 新昌县| 阜康市| 嘉定区| 乡城县| 昌吉市| 门头沟区| 平安县| 璧山县| 瓮安县| 达州市| 松原市| 毕节市| 图木舒克市| 博乐市| 班戈县| 个旧市| 兰州市| 开江县| 屏东市| 富顺县| 孟州市| 环江| 达尔| 探索| 永定县| 贵港市| 区。| 行唐县| 仁寿县| 武川县| 友谊县| 邹城市| 建始县| 衡山县| 宣武区| 达日县|