書名： Hands-On Q-Learning with Python
作者名： Nazia Habib
本章字?jǐn)?shù)： 197字
更新時(shí)間： 2021-06-24 15:13:16

Fine-tuning your model – learning, discount, and exploration rates

Recall our discussion of the three major hyperparameters of a Q-learning model:

Alpha: The learning rate
Gamma: The discount rate
Epsilon: The exploration rate

What values should we choose for these hyperparameters to optimize the performance of our taxi agent? We will discover these values through experimentation once we have constructed our game environment, and we can also take advantage of existing research on the taxi problem and set the variables to known optimal values.

A large part of our model-tuning and optimization phase will consist of comparing the performance of different combinations of these three hyperparamenters together.

One option that we have is the ability to decay any, or all, of these hyperparameters – in other words, to reduce their values as we progress through a game loop or conduct repeated trials. In practice, we will almost always decay epsilon, since we want our agent to adapt to the knowledge it has of its environment and explore less as it becomes better aware of the highest-valued actions to take. But it can sometimes be to our benefit to decay the other hyperparameters as well.

官术网_书友最值得收藏!

Hands-On Q-Learning with Python

Fine-tuning your model – learning, discount, and exploration rates