- PyTorch 1.x Reinforcement Learning Cookbook
- Yuxi (Hayden) Liu
- 178字
- 2021-06-24 12:34:40
Implementing and evaluating a random search policy
After some practice with PyTorch programming, starting from this recipe, we will be working on more sophisticated policies to solve the CartPole problem than purely random actions. We start with the random search policy in this recipe.
A simple, yet effective, approach is to map an observation to a vector of two numbers representing two actions. The action with the higher value will be picked. The linear mapping is depicted by a weight matrix whose size is 4 x 2 since the observations are 4-dimensional in this case. In each episode, the weight is randomly generated and is used to compute the action for every step in this episode. The total reward is then calculated. This process repeats for many episodes and, in the end, the weight that enables the highest total reward will become the learned policy. This approach is called random search because the weight is randomly picked in each trial with the hope that the best weight will be found with a large number of trials.
- Mastercam 2017數控加工自動編程經典實例(第4版)
- Getting Started with Clickteam Fusion
- ETL with Azure Cookbook
- 計算機應用復習與練習
- 手把手教你玩轉RPA:基于UiPath和Blue Prism
- 數控銑削(加工中心)編程與加工
- Matplotlib 3.0 Cookbook
- 現代傳感技術
- 基于單片機的嵌入式工程開發詳解
- Mastering Exploratory Analysis with pandas
- 基于RPA技術財務機器人的應用與研究
- 菜鳥起飛電腦組裝·維護與故障排查
- EJB JPA數據庫持久層開發實踐詳解
- Hands-On Agile Software Development with JIRA
- Hands-On Artificial Intelligence for Beginners