官术网_书友最值得收藏!

Getting ready

We change each weight within the neural network by a small amount  one at a time. A change in the weight value will have an impact on the final loss value (either increasing or decreasing loss). We'll update the weight in the direction of decreasing loss.

Additionally, in some scenarios, for a small change in weight, the error increases/decreases considerably, while in some cases the error decreases by a small amount.

By updating the weights by a small amount and measuring the change in error that the update in weights leads to, we are able to do the following:

  • Determine the direction of the weight update
  • Determine the magnitude of the weight update

Before implementing back-propagation, let's understand one additional detail of neural networks: the learning rate.

Intuitively, the learning rate helps us to build trust in the algorithm. For example, when deciding on the magnitude of the weight update, we would potentially not change it by a huge amount in one go, but take a more careful approach in updating the weights more slowly.

This results in obtaining stability in our model; we will look at how the learning rate helps with stability in the next chapter.

The whole process by which we update weights to reduce error is called a gradient-descent technique.

Stochastic gradient descent is the means by which error is minimized in the preceding scenario. More intuitively, gradient stands for difference (which is the difference between actual and predicted) and descent means reduce. Stochastic stands for the selection of number of random samples based on which a decision is taken.

Apart from stochastic gradient descent, there are many other optimization techniques that help to optimize for the loss values; the different optimization techniques will be discussed in the next chapter.

Back-propagation works as follows:

  • Calculates the overall cost function from the feedforward process.
  • Varies all the weights (one at a time) by a small amount.
  • Calculates the impact of the variation of weight on the cost function.
  • Depending on whether the change has an increased or decreased the cost (loss) value, it updates the weight value in the direction of loss decrease. And then repeats this step across all the weights we have.

If the preceding steps are performed n number of times, it essentially results in n epochs

In order to further cement our understanding of back-propagation in neural networks, let's start with a known function and see how the weights could be derived:

For now, we will have the known function as y = 2xwhere we try to come up with the weight value and bias value, which are 2 and 0 in this specific case:

 

If we formulate the preceding dataset as a linear regression, (y = a*x+b), where we are trying to calculate the values of a and b (which we already know are 2 and 0, but are checking how those values are obtained using gradient descent), let's randomly initialize the a and b parameters to values of 1.477 and 0 (the ideal values of which are 2 and 0).

主站蜘蛛池模板: 泰顺县| 开远市| 麻栗坡县| 焉耆| 乌兰察布市| 获嘉县| 惠东县| 大邑县| 新野县| 新安县| 福清市| 鄂伦春自治旗| 江源县| 临夏市| 汕尾市| 邹城市| 鄂尔多斯市| 益阳市| 嘉荫县| 云南省| 定兴县| 左权县| 东兴市| 隆化县| 永安市| 天柱县| 九台市| 江津市| 惠水县| 称多县| 新平| 项城市| 芦山县| 北川| 富裕县| 泊头市| 阿拉善右旗| 进贤县| 亳州市| 北流市| 松潘县|