官术网_书友最值得收藏!

Developing the hill-climbing algorithm

As we can see in the random search policy, each episode is independent. In fact, all episodes in random search can be run in parallel, and the weight that achieves the best performance will eventually be selected. We've also verified this with the plot of reward versus episode, where there is no upward trend. In this recipe, we will develop a different algorithm, a hill-climbing algorithm, to transfer the knowledge acquired in one episode to the next episode.

In the hill-climbing algorithm, we also start with a randomly chosen weight. But here, for every episode, we add some noise to the weight. If the total reward improves, we update the weight with the new one; otherwise, we keep the old weight. In this approach, the weight is gradually improved as we progress through the episodes, instead of jumping around in each episode.

主站蜘蛛池模板: 嘉善县| 宁明县| 滦平县| 丹东市| 慈利县| 洞口县| 清徐县| 思南县| 双江| 茌平县| 嘉黎县| 阜城县| 繁峙县| 应城市| 周至县| 扎鲁特旗| 江津市| 米易县| 绵阳市| 清丰县| 荆州市| 利津县| 义马市| 杭锦后旗| 西乌珠穆沁旗| 会理县| 博白县| 宁夏| 大连市| 阿坝| 吴忠市| 阳西县| 镇平县| 乐亭县| 玛纳斯县| 双柏县| 乐清市| 石家庄市| 惠来县| 开鲁县| 日照市|