官术网_书友最值得收藏!

Fine-tuning your model – learning, discount, and exploration rates

Recall our discussion of the three major hyperparameters of a Q-learning model: 

  • Alpha: The learning rate
  • Gamma: The discount rate
  • Epsilon: The exploration rate

What values should we choose for these hyperparameters to optimize the performance of our taxi agent? We will discover these values through experimentation once we have constructed our game environment, and we can also take advantage of existing research on the taxi problem and set the variables to known optimal values. 

A large part of our model-tuning and optimization phase will consist of comparing the performance of different combinations of these three hyperparamenters together. 

One option that we have is the ability to decay any, or all, of these hyperparameters – in other words, to reduce their values as we progress through a game loop or conduct repeated trials. In practice, we will almost always decay epsilon, since we want our agent to adapt to the knowledge it has of its environment and explore less as it becomes better aware of the highest-valued actions to take. But it can sometimes be to our benefit to decay the other hyperparameters as well.

主站蜘蛛池模板: 垫江县| 垦利县| 英德市| 绥化市| 泸水县| 韶关市| 友谊县| 志丹县| 阜城县| 屯昌县| 九龙县| 民勤县| 剑河县| 丰都县| 林芝县| 登封市| 渝中区| 酉阳| 陇川县| 雷山县| 如皋市| 武功县| 平舆县| 中西区| 海兴县| 新巴尔虎右旗| 铁岭市| 沙雅县| 汉川市| 吉安市| 克拉玛依市| 莒南县| 奉节县| 永济市| 高唐县| 峡江县| 南木林县| 千阳县| 武功县| 清原| 大同县|