官术网_书友最值得收藏!

Getting ready

We change each weight within the neural network by a small amount  one at a time. A change in the weight value will have an impact on the final loss value (either increasing or decreasing loss). We'll update the weight in the direction of decreasing loss.

Additionally, in some scenarios, for a small change in weight, the error increases/decreases considerably, while in some cases the error decreases by a small amount.

By updating the weights by a small amount and measuring the change in error that the update in weights leads to, we are able to do the following:

  • Determine the direction of the weight update
  • Determine the magnitude of the weight update

Before implementing back-propagation, let's understand one additional detail of neural networks: the learning rate.

Intuitively, the learning rate helps us to build trust in the algorithm. For example, when deciding on the magnitude of the weight update, we would potentially not change it by a huge amount in one go, but take a more careful approach in updating the weights more slowly.

This results in obtaining stability in our model; we will look at how the learning rate helps with stability in the next chapter.

The whole process by which we update weights to reduce error is called a gradient-descent technique.

Stochastic gradient descent is the means by which error is minimized in the preceding scenario. More intuitively, gradient stands for difference (which is the difference between actual and predicted) and descent means reduce. Stochastic stands for the selection of number of random samples based on which a decision is taken.

Apart from stochastic gradient descent, there are many other optimization techniques that help to optimize for the loss values; the different optimization techniques will be discussed in the next chapter.

Back-propagation works as follows:

  • Calculates the overall cost function from the feedforward process.
  • Varies all the weights (one at a time) by a small amount.
  • Calculates the impact of the variation of weight on the cost function.
  • Depending on whether the change has an increased or decreased the cost (loss) value, it updates the weight value in the direction of loss decrease. And then repeats this step across all the weights we have.

If the preceding steps are performed n number of times, it essentially results in n epochs

In order to further cement our understanding of back-propagation in neural networks, let's start with a known function and see how the weights could be derived:

For now, we will have the known function as y = 2xwhere we try to come up with the weight value and bias value, which are 2 and 0 in this specific case:

 

If we formulate the preceding dataset as a linear regression, (y = a*x+b), where we are trying to calculate the values of a and b (which we already know are 2 and 0, but are checking how those values are obtained using gradient descent), let's randomly initialize the a and b parameters to values of 1.477 and 0 (the ideal values of which are 2 and 0).

主站蜘蛛池模板: 巴彦县| 平武县| 长宁区| 拜泉县| 永泰县| 霍林郭勒市| 乐东| 保亭| 平果县| 汕头市| 卢氏县| 沂水县| 北辰区| 阿克| 修文县| 铜山县| 磐石市| 新丰县| 凯里市| 鸡东县| 商洛市| 翁牛特旗| 蚌埠市| 台湾省| 旌德县| 安庆市| 黄浦区| 长阳| 涟水县| 阜南县| 和硕县| 育儿| 绥化市| 额敏县| 资阳市| 息烽县| 新疆| 泰兴市| 清新县| 鹤岗市| 昌黎县|