官术网_书友最值得收藏!

There's more...

If we examine the reward/episode plot, it seems that we can also stop early during training when it has been solved – the average reward over 100 consecutive episodes is no less than 195. We just add the following lines of code to the training session:

 >>> if episode >= 99 and sum(total_rewards[-100:]) >= 19500:
... break

Re-run the training session. You should get something similar to the following, which stops after several hundred episodes:

Episode 1: 10.0
Episode 2: 27.0
Episode 3: 28.0
Episode 4: 15.0
Episode 5: 12.0
……
……
Episode 549: 200.0
Episode 550: 200.0
Episode 551: 200.0
Episode 552: 200.0
Episode 553: 200.0
主站蜘蛛池模板: 襄樊市| 卓资县| 班戈县| 黑山县| 涟水县| 诸城市| 无为县| 泰州市| 旬阳县| 六枝特区| 盐津县| 黑河市| 白水县| 铜鼓县| 沁源县| 双流县| 隆子县| 临海市| 青神县| 卓尼县| 建瓯市| 博白县| 道孚县| 唐山市| 新民市| 常山县| 桦甸市| 始兴县| 阳东县| 弋阳县| 娱乐| 弥勒县| 博白县| 平利县| 太谷县| 浦江县| 夏河县| 和龙市| 阜平县| 海南省| 肃南|