書名： Deep Learning Essentials
作者名： Wei Di Anurag Bhardwaj Jianing Wei
本章字數： 131字
更新時間： 2021-06-30 19:17:53

Choosing the right activation function

In most cases, we should always consider ReLU first. But keep in mind that ReLU should only be applied to hidden layers. If your model suffers from dead neurons, then think about adjusting your learning rate, or try Leaky ReLU or maxout.

It is not recommended to use either sigmoid or tanh as they suffer from the vanishing gradient problem and also converge very slowly. Take sigmoid for example. Its derivative is greater than 0.25 everywhere, making terms during backpropagating even smaller. While for ReLU, its derivative is one at every point above zero, thus creating a more stable network.

Now you have gained a basic knowledge of the key components in neural networks, let's move on to understanding how the networks learn from data.

官术网_书友最值得收藏!

Deep Learning Essentials

Choosing the right activation function