官术网_书友最值得收藏!

Decaying alpha 

In a totally deterministic environment, we will want to keep alpha at 1 at all times, since we already know that alpha = 1 will cause the agent to learn the best policy for that environment. But, in a stochastic environment, including most of the environments that we will be working in when we build Q-learning models, decaying alpha based on what we have already learned can allow our algorithm to converge faster. 

In practice, for a problem such as this, we are unlikely to decay alpha in the course of running an environment, as the noticeable benefits will be negligible. We will see this in action when we begin choosing values for the hyperparameters.

For the taxi problem, we are likely to start with an alpha such as 0.1 and progressively compare it to higher values. We could also run a programmatic method, such as a cross-validated grid search, to identify the optimal hyperparameter values that allow the algorithm to converge fastest.

主站蜘蛛池模板: 屏边| 三亚市| 会东县| 泸溪县| 盘锦市| 南投县| 定边县| 通城县| 无极县| 苍梧县| 天津市| 华蓥市| 横峰县| 永年县| 衡阳市| 克什克腾旗| 同仁县| 丰县| 永修县| 黔南| 兴隆县| 巴林右旗| 沧源| 郁南县| 北辰区| 垣曲县| 营口市| 辽宁省| 安徽省| 赣州市| 雷州市| 丹棱县| 南皮县| 遵化市| 景泰县| 勐海县| 来安县| 乌拉特中旗| 仁寿县| 怀远县| 麦盖提县|