官术网_书友最值得收藏!

How it works...

We are able to achieve much better performance with the hill-climbing algorithm than with random search by simply adding adaptive noise to each episode. We can think of it as a special kind of gradient descent without a target variable. The additional noise is the gradient, albeit in a random way. The noise scale is the learning rate, and it is adaptive to the reward from the previous episode. The target variable in hill climbing becomes achieving the highest reward. In summary, rather than isolating each episode, the agent in the hill-climbing algorithm makes use of the knowledge learned from each episode and performs a more reliable action in the next episode. As the name hill climbing implies, the reward moves upwards through the episodes as the weight gradually moves towards the optimum value.

主站蜘蛛池模板: 昌平区| 且末县| 泽州县| 岳池县| 福建省| 犍为县| 二连浩特市| 读书| 华蓥市| 灌南县| 芦溪县| 阿瓦提县| 团风县| 潍坊市| 邻水| 潞西市| 保定市| 嘉祥县| 梁山县| 明光市| 友谊县| 东乌珠穆沁旗| 星子县| 靖安县| 灵宝市| 赣榆县| 敦煌市| 邢台市| 民县| 牙克石市| 武山县| 宽甸| 南丹县| 竹溪县| 芮城县| 清涧县| 太仓市| 平昌县| 齐齐哈尔市| 敦煌市| 曲阳县|