官术网_书友最值得收藏!

Vanishing and exploding gradients

These are very important issues in many deep neural networks. The deeper the architecture, the more likely it suffers from these issues. What is happening is that during the backpropagation stage, weights are adjusted in proportion to the gradient value. So we may have two different scenarios:

  • If the gradients are too small, then this is called the vanishing gradients problem. It makes the learning process very slow or even stops updating entirely. For example, using sigmoid as the activation function, where its derivatives are always smaller than 0.25, after a few layers of backpropagation, the lower layers will hardly receive any useful signals from the errors, thus the network is not updated properly.
  • If the gradients get too large then it can cause the learning to diverge, this is called exploding gradients. This often happens when the activation function is not bounded or the learning rate is too big.
主站蜘蛛池模板: 濉溪县| 开江县| 万荣县| 灵宝市| 五峰| 沁源县| 佛学| 林州市| 阿图什市| 屏东县| 栾城县| 内乡县| 三穗县| 灯塔市| 高密市| 新源县| 鞍山市| 安溪县| 阳谷县| 从化市| 武功县| 德江县| 铁岭市| 漾濞| 大港区| 洛扎县| 那坡县| 长治市| 皮山县| 行唐县| 新宁县| 龙南县| 新沂市| 泸西县| 松原市| 调兵山市| 达拉特旗| 象州县| 武乡县| 瓦房店市| 宾川县|