官术网_书友最值得收藏!

The back propagation function

Once forward propagation is complete, we have the network's prediction for each data point. We also know that data point's actual value. Typically, the prediction is defined as  while the actual value of the target variable is defined as y.

Once both y and  are known, the network's error can be computed using the cost function. Recall that the cost function is the average of the loss function.

In order for learning to occur within the network, the network's error signal must be propagated backwards through the network layers from the last layer to the first. Our goal in back propagation is to propagate this error signal backwards through the network while using it to update the network weights as the signal travels. Mathematically, to do so we need to minimize the cost function by nudging the weights towards values that make the cost function the smallest. This process is called gradient descent.

The gradient is the partial derivative of the error function with respect to each weight within the network. The gradient of each weight can be calculated, layer by layer, using the chain rule and the gradients of the layers above.

Once the gradients of each layer are known, we can use the gradient descent algorithm to minimize the cost function.

The Gradient Descent will repeat this update until the network's error is minimized and the process has converged:

The gradient descent algorithm multiples the gradient by a learning rate called alpha and subtracts that value from the current value of each weight. The learning rate is a hyperparameter.

主站蜘蛛池模板: 锡林郭勒盟| 徐州市| 新津县| 自治县| 柘荣县| 喜德县| 衢州市| 佳木斯市| 无为县| 东海县| 龙川县| 康乐县| 富阳市| 中牟县| 博白县| 江安县| 怀仁县| 阿克| 鄂托克旗| 平邑县| 田林县| 融水| 兴海县| 城口县| 绵竹市| 凉城县| 石棉县| 龙山县| 康保县| 论坛| 耒阳市| 昌平区| 渝中区| 新巴尔虎左旗| 龙陵县| 枣强县| 日照市| 布尔津县| 虹口区| 潢川县| 奉新县|