- Java Deep Learning Projects
- Md. Rezaul Karim
- 298字
- 2021-06-18 19:08:00
Weight optimization
Before the training starts, the network parameters are set randomly. Then to optimize the network weights, an iterative algorithm called Gradient Descent (GD) is used. Using GD optimization, our network computes the cost gradient based on the training set. Then, through an iterative process, the gradient G of the error function E is computed.
In following graph, gradient G of error function E provides the direction in which the error function with current values has the steeper slope. Since the ultimate target is to reduce the network error, GD makes small steps in the opposite direction -G. This iterative process is executed a number of times, so the error E would move down towards the global minima. This way, the ultimate target is to reach a point where G = 0, where no further optimization is possible:

Searching for the minimum for the error function E; we move in the direction in which the gradient G of E is minimal
The downside is that it takes too long to converge, which makes it impossible to meet the demand of handling large-scale training data. Therefore, a faster GD called Stochastic Gradient Descent (SDG) is proposed, which is also a widely used optimizer in DNN training. In SGD, we use only one training sample per iteration from the training set to update the network parameters.
I'm not saying SGD is the only available optimization algorithm, but there are so many advanced optimizers available nowadays, for example, Adam, RMSProp, ADAGrad, Momentum, and so on. More or less, most of them are either direct or indirect optimized versions of SGD.
By the way, the term stochastic comes from the fact that the gradient based on a single training sample per iteration is a stochastic approximation of the true cost gradient.
- 高性能混合信號ARM:ADuC7xxx原理與應用開發
- 并行數據挖掘及性能優化:關聯規則與數據相關性分析
- 計算機應用復習與練習
- Visual C# 2008開發技術實例詳解
- MCSA Windows Server 2016 Certification Guide:Exam 70-741
- 自主研拋機器人技術
- 運動控制器與交流伺服系統的調試和應用
- 零起點學西門子S7-200 PLC
- 實用網絡流量分析技術
- Unreal Development Kit Game Design Cookbook
- 手把手教你學Photoshop CS3
- Cloudera Hadoop大數據平臺實戰指南
- 新世紀Photoshop CS6中文版應用教程
- 工業控制系統安全
- Orange'S:一個操作系統的實現