- Deep Learning Quick Reference
- Mike Bernico
- 98字
- 2021-06-24 18:40:05
Using momentum with gradient descent
Using gradient descent with momentum speeds up gradient descent by increasing the speed of learning in directions the gradient has been constant in direction while slowing learning in directions the gradient fluctuates in direction. It allows the velocity of gradient descent to increase.
Momentum works by introducing a velocity term, and using a weighted moving average of that term in the update rule, as follows:


Most typically is set to 0.9 in the case of momentum, and usually this is not a hyper-parameter that needs to be changed.