書名： Hands-On Q-Learning with Python
作者名： Nazia Habib
本章字數： 154字
更新時間： 2021-06-24 15:13:17

Decaying gamma

Decaying gamma will have the agent prioritize short-term rewards as it learns what those rewards are, and puts less emphasis on long-term rewards.

Remember that a gamma value of 0 will cause an agent to totally disregard future values and focus only on current rewards, and that a gamma value of 1 will cause it to prioritize future values in the same way as current ones. Decaying gamma will, therefore, increase its focus onto current rewards and away from future rewards.

Intuitively, this benefits us, because the closer we get to our goal, the more we want to take advantage of these short-term rewards instead of holding out for future rewards that won't be available after we complete the task. We can reach our goal faster and more efficiently by changing the use of the resources that we have available to us as the availability of those resources changes.

官术网_书友最值得收藏!

Hands-On Q-Learning with Python

Decaying gamma