- Deep Learning with Microsoft Cognitive Toolkit Quick Start Guide
- Willem Meints
- 528字
- 2021-07-02 12:08:35
Optimizing a neural network
We've talked about making predictions with neural networks. We haven't yet talked about how to optimize the parameters in a neural network. Let's go over each of the components in a neural network and explore how they work together when we train it:

A neural network has several layers that are connected together. Each layer will have a set of trainable parameters that we want to optimize. Optimizing a neural network is done using a technique called backpropagation. We aim to minimize the output of a loss function by gradually optimizing the values for the w1, w2, and w3 parameters in the preceding diagram.
The loss function for a neural network can take many shapes. Typically, we choose a function that expresses the difference between the expected output, Y, and the real output produced by the neural network. For example: we could use the following loss function:

Firstly, the neural network is initialized with . We can do this with random values for all of the parameters in the model.
After we initialize the neural network, we feed data into the neural network to make a prediction. We then feed the prediction together with the expected output into a loss function to measure how close the model is to what we expect it to be.
The feedback from the loss function is used to feed an optimizer. The optimizer uses a technique called gradient descent to find out how to optimize each of the parameters.
Gradient descent is a key ingredient of neural network optimization and works because of an interesting property of the loss function. When you visualize the output of the loss function for one set of input with different values for the parameters in the neural network, you end up with a plot that looks similar to this:

At the beginning of the backpropagation process, we start somewhere on one of the slopes in this mountain landscape. Our aim is to walk down the mountain toward a point where the values for the parameters are at their best. This is the point where the output of the loss function is minimized as much as possible.
For us to find the way down the mountain slope, we need to find a function that expresses the slope at the current spot on the mountain slope. We do this by creating a derived function from the loss function. This derived function gives us the gradients for the parameters in the model.
When we perform one pass of the backpropagation process, we take one step down the mountain using the gradients for the parameters. We can add the gradients to the parameters to do this. But this is a dangerous way of following the slope down the mountain. Because if we move too fast, we might miss the optimum spot. Therefore, all neural network optimizers have a setting called the learning rate. The learning rate controls the rate of descent.
Because we can only take small steps in the gradient-descent algorithm, we need to repeat this process many times to reach the optimum values for the neural network parameters.
- 同步:秩序如何從混沌中涌現
- 程序員修煉之道:從小工到專家
- 數據庫應用實戰
- 深入淺出Greenplum分布式數據庫:原理、架構和代碼分析
- 貫通SQL Server 2008數據庫系統開發
- 數據分析師養成寶典
- 智慧城市中的大數據分析技術
- Google Cloud Platform for Architects
- 標簽類目體系:面向業務的數據資產設計方法論
- 成功之路:ORACLE 11g學習筆記
- Access 2010數據庫應用技術教程(第二版)
- C# 7 and .NET Core 2.0 High Performance
- Scratch Cookbook
- Discovering Business Intelligence Using MicroStrategy 9
- 用戶畫像:平臺構建與業務實踐