官术网_书友最值得收藏!

How it works...

In this recipe, we print out the state array for every step. But what does each float in the array mean? We can find more information about CartPole on Gym's GitHub wiki page: https://github.com/openai/gym/wiki/CartPole-v0. It turns out that those four floats represent the following:  

  • Cart position: This ranges from -2.4 to 2.4, and any position beyond this range will trigger episode termination.
  • Cart velocity.
  • Pole angle: Any value less than -0.209 (-12 degrees) or greater than 0.209 (12 degrees) will trigger episode termination.
  • Pole velocity at the tip.

In terms of the action, it is either 0 or 1, which corresponds to pushing the cart to the left and to the right, respectively.

The reward in this environment is +1 for every timestep before the episode terminates. We can also verify this by printing out the reward for every step. And the total reward is simply the number of timesteps.

主站蜘蛛池模板: 若羌县| 东平县| 洛阳市| 巩义市| 巨鹿县| 宜章县| 乌苏市| 文水县| 鄂托克旗| 翁源县| 德令哈市| 屏南县| 连江县| 德格县| 曲松县| 会东县| 金昌市| 元朗区| 塘沽区| 缙云县| 孝感市| 宣汉县| 青岛市| 高阳县| 兴仁县| 木兰县| 陵水| 眉山市| 防城港市| 潮州市| 曲水县| 昆明市| 五原县| 萨嘎县| 武宁县| 宁海县| 渭南市| 新干县| 双辽市| 兴业县| 丰台区|