官术网_书友最值得收藏!

Developing a policy gradient algorithm

The last recipe of the first chapter is about solving the CartPole environment with a policy gradient algorithm. This may be more complicated than we need for this simple problem, in which the random search and hill-climbing algorithms suffice. However, it is a great algorithm to learn, and we will use it in more complicated environments later in the book.

In the policy gradient algorithm, the model weight moves in the direction of the gradient at the end of each episode. We will explain the computation of gradients in the next section. Also, in each step, it samples an action from the policy based on the probabilities computed using the state and weight. It no longer takes an action with certainty, in contrast with random search and hill climbing (by taking the action achieving the higher score). Hence, the policy switches from deterministic to stochastic.  

主站蜘蛛池模板: 娄底市| 丰城市| 徐闻县| 五莲县| 富锦市| 永平县| 衡阳县| 永顺县| 久治县| 阳新县| 读书| 安福县| 大洼县| 新丰县| 册亨县| 富源县| 巩留县| 万载县| 安仁县| 西吉县| 平遥县| 磐安县| 三明市| 贡觉县| 阿坝县| 黔西县| 仪陇县| 托克托县| 定南县| 安吉县| 沾化县| 启东市| 凤山市| 黎川县| 扎囊县| 营口市| 昌吉市| 五莲县| 丁青县| 阿拉善左旗| 碌曲县|