- PyTorch 1.x Reinforcement Learning Cookbook
- Yuxi (Hayden) Liu
- 111字
- 2021-06-24 12:34:43
There's more...
If we examine the reward/episode plot, it seems that we can also stop early during training when it has been solved – the average reward over 100 consecutive episodes is no less than 195. We just add the following lines of code to the training session:
>>> if episode >= 99 and sum(total_rewards[-100:]) >= 19500:
... break
Re-run the training session. You should get something similar to the following, which stops after several hundred episodes:
Episode 1: 10.0
Episode 2: 27.0
Episode 3: 28.0
Episode 4: 15.0
Episode 5: 12.0
……
……
Episode 549: 200.0
Episode 550: 200.0
Episode 551: 200.0
Episode 552: 200.0
Episode 553: 200.0
推薦閱讀
- 平面設(shè)計初步
- 基于LabWindows/CVI的虛擬儀器設(shè)計與應(yīng)用
- 輕松學(xué)PHP
- 物聯(lián)網(wǎng)與云計算
- 模型制作
- PostgreSQL Administration Essentials
- 大型數(shù)據(jù)庫管理系統(tǒng)技術(shù)、應(yīng)用與實(shí)例分析:SQL Server 2005
- Learn CloudFormation
- 聊天機(jī)器人:入門、進(jìn)階與實(shí)戰(zhàn)
- Blender 3D Printing by Example
- The Python Workshop
- 分?jǐn)?shù)階系統(tǒng)分析與控制研究
- 網(wǎng)站前臺設(shè)計綜合實(shí)訓(xùn)
- 軟件構(gòu)件技術(shù)
- 嵌入式GUI開發(fā)設(shè)計