官术网_书友最值得收藏!

There's more...

So far, we've run only one episode. In order to assess how well the agent performs, we can simulate many episodes and then average the total rewards for an individual episode. The average total reward will tell us about the performance of the agent that takes random actions.

Let’s set 10,000 episodes:

 >>> n_episode = 10000

In each episode, we compute the total reward by accumulating the reward in every step:

 >>> total_rewards = []
>>> for episode in range(n_episode):
... state = env.reset()
... total_reward = 0
... is_done = False
... while not is_done:
... action = env.action_space.sample()
... state, reward, is_done, _ = env.step(action)
... total_reward += reward
... total_rewards.append(total_reward)

Finally, we calculate the average total reward:

 >>> print('Average total reward over {} episodes: {}'.format(
n_episode, sum(total_rewards) / n_episode))
Average total reward over 10000 episodes: 22.2473

On average, taking a random action scores 22.25.  

We all know that taking random actions is not sophisticated enough, and we will implement an advanced policy in upcoming recipes. But for the next recipe, let's take a break and review the basics of PyTorch.

主站蜘蛛池模板: 潜山县| 临澧县| 珲春市| 岱山县| 芦山县| 杨浦区| 怀化市| 潢川县| 电白县| 麻江县| 沙湾县| 阿克陶县| 海淀区| 庆元县| 奉贤区| 开鲁县| 砚山县| 黄陵县| 竹北市| 洞头县| 怀来县| 哈密市| 陆丰市| 盐山县| 嘉义县| 巴南区| 孝昌县| 罗山县| 呼和浩特市| 孝义市| 南溪县| 澄迈县| 宁海县| 阿坝县| 顺昌县| 元谋县| 长丰县| 蕉岭县| 绥中县| 西乡县| 博客|