- Hands-On Q-Learning with Python
- Nazia Habib
- 197字
- 2021-06-24 15:13:16
Fine-tuning your model – learning, discount, and exploration rates
Recall our discussion of the three major hyperparameters of a Q-learning model:
- Alpha: The learning rate
- Gamma: The discount rate
- Epsilon: The exploration rate
What values should we choose for these hyperparameters to optimize the performance of our taxi agent? We will discover these values through experimentation once we have constructed our game environment, and we can also take advantage of existing research on the taxi problem and set the variables to known optimal values.
A large part of our model-tuning and optimization phase will consist of comparing the performance of different combinations of these three hyperparamenters together.
One option that we have is the ability to decay any, or all, of these hyperparameters – in other words, to reduce their values as we progress through a game loop or conduct repeated trials. In practice, we will almost always decay epsilon, since we want our agent to adapt to the knowledge it has of its environment and explore less as it becomes better aware of the highest-valued actions to take. But it can sometimes be to our benefit to decay the other hyperparameters as well.
- Canvas LMS Course Design
- 快學(xué)Flash動(dòng)畫百例
- CorelDRAW X4中文版平面設(shè)計(jì)50例
- OpenStack Cloud Computing Cookbook(Second Edition)
- Implementing AWS:Design,Build,and Manage your Infrastructure
- Troubleshooting OpenVPN
- 單片機(jī)C語言應(yīng)用100例
- 人工智能:語言智能處理
- R Data Analysis Projects
- Silverlight 2完美征程
- 青少年VEX IQ機(jī)器人實(shí)訓(xùn)課程(初級)
- FreeCAD [How-to]
- 軟件質(zhì)量管理實(shí)踐
- 數(shù)據(jù)結(jié)構(gòu)與實(shí)訓(xùn)
- 工業(yè)控制系統(tǒng)安全