官术网_书友最值得收藏!

Implementing and evaluating a random search policy

After some practice with PyTorch programming, starting from this recipe, we will be working on more sophisticated policies to solve the CartPole problem than purely random actions. We start with the random search policy in this recipe.

A simple, yet effective, approach is to map an observation to a vector of two numbers representing two actions. The action with the higher value will be picked. The linear mapping is depicted by a weight matrix whose size is 4 x 2 since the observations are 4-dimensional in this case. In each episode, the weight is randomly generated and is used to compute the action for every step in this episode. The total reward is then calculated. This process repeats for many episodes and, in the end, the weight that enables the highest total reward will become the learned policy. This approach is called random search because the weight is randomly picked in each trial with the hope that the best weight will be found with a large number of trials.

主站蜘蛛池模板: 遂溪县| 高安市| 北票市| 大石桥市| 军事| 格尔木市| 绵竹市| 治县。| 五峰| 河曲县| 新龙县| 元朗区| 武宁县| 五家渠市| 乐安县| 美姑县| 洛南县| 大荔县| 万荣县| 铁力市| 长泰县| 桂东县| 米易县| 牟定县| 喀什市| 法库县| 呼玛县| 拜城县| 浦北县| 龙南县| 鹰潭市| 花莲县| 丰城市| 大港区| 盈江县| 平安县| 安陆市| 台中县| 高唐县| 松阳县| 清水县|