官术网_书友最值得收藏!

Speeding up the training process using batch normalization

In the previous section on the scaling dataset, we learned that optimization is slow when the input data is not scaled (that is, it is not between zero and one). 

The hidden layer value could be high in the following scenarios:

  • Input data values are high
  • Weight values are high
  • The multiplication of weight and input are high

Any of these scenarios can result in a large output value on the hidden layer.

Note that the hidden layer is the input layer to output layer. Hence, the phenomenon of high input values resulting in a slow optimization holds true when hidden layer values are large as well.

Batch normalization comes to the rescue in this scenario. We have already learned that, when input values are high, we perform scaling to reduce the input values. Additionally, we have learned that scaling can also be performed using a different method, which is to subtract the mean of the input and divide it by the standard deviation of the input. Batch normalization performs this method of scaling.

Typically, all values are scaled using the following formula:

Notice that γ and β are learned during training, along with the original parameters of the network.

主站蜘蛛池模板: 怀化市| 杂多县| 宝应县| 湖南省| 崇州市| 开化县| 陈巴尔虎旗| 苏尼特左旗| 馆陶县| 宁蒗| 大名县| 黄龙县| 盐津县| 灌阳县| 陈巴尔虎旗| 阜城县| 炉霍县| 淮滨县| 白玉县| 尼木县| 蓬安县| 开化县| 江永县| 辽阳县| 长治市| 大丰市| 浦东新区| 黎川县| 灵山县| 武城县| 临潭县| 淳化县| 水城县| 隆昌县| 始兴县| 富川| 密云县| 文山县| 盐津县| 宁海县| 肃宁县|