- Generative Adversarial Networks Projects
- Kailash Ahirwar
- 307字
- 2021-07-02 13:38:47
Vanishing gradients
During backpropagation, gradient flows backward, from the final layer to the first layer. As it flows backward, it gets increasingly smaller. Sometimes, the gradient is so small that the initial layers learn very slowly or stop learning completely. In this case, the gradient doesn't change the weight values of the initial layers at all, so the training of the initial layers in the network is effectively stopped. This is known as the vanishing gradients problem.
This problem gets worse if we train a bigger network with gradient-based optimization methods. Gradient-based optimization methods optimize a parameter's value by calculating the change in the network's output when we change the parameter's value by a small amount. If a change in the parameter's value causes a small change in the network's output, the weight change will be very small, so the network stops learning.
This is also a problem when we use activation functions, such as Sigmoid and Tanh. Sigmoid activation functions restrict values to a range of between 0 and 1, converting large values of x to approximately 1 and small or negative values of x to approximately zero. The Tanh activation function squashes input values to a range between -1 and 1, converting large input values to approximately 1 and small values to approximately minus 1. When we apply backpropagation, we use the chain rule of differentiation, which has a multiplying effect. As we reach the initial layers of the network, the gradient (the error) decreases exponentially, causing the vanishing gradients problem.
To overcome this problem, we can use activation functions such as ReLU, LeakyReLU, and PReLU. The gradients of these activation functions don't saturate during backpropagation, causing efficient training of neural networks. Another solution is to use batch normalization, which normalizes inputs to the hidden layers of the networks.
- 網(wǎng)絡(luò)服務(wù)器架設(shè)(Windows Server+Linux Server)
- Apache Spark Deep Learning Cookbook
- 嵌入式操作系統(tǒng)原理及應(yīng)用
- 電腦日常使用與維護(hù)322問
- 電腦上網(wǎng)輕松入門
- LMMS:A Complete Guide to Dance Music Production Beginner's Guide
- C++程序設(shè)計(jì)基礎(chǔ)(上)
- 工業(yè)機(jī)器人入門實(shí)用教程
- Hands-On SAS for Data Analysis
- 筆記本電腦維修之電路分析基礎(chǔ)
- Machine Learning in Java
- Getting Started with Tableau 2018.x
- Mastering SQL Server 2014 Data Mining
- Raspberry Pi 3 Projects for Java Programmers
- 工業(yè)機(jī)器人基礎(chǔ)