- PyTorch 1.x Reinforcement Learning Cookbook
- Yuxi (Hayden) Liu
- 238字
- 2021-06-24 12:34:46
How it works...
We have just seen how effective it is to compute the value of a policy using policy evaluation. It is a simple convergent iterative approach, in the dynamic programming family, or to be more specific, approximate dynamic programming. It starts with random guesses as to the values and then iteratively updates them according to the Bellman expectation equation until they converge.
In Step 5, the policy evaluation function does the following tasks:
- Initializes the policy values as all zeros.
- Updates the values based on the Bellman expectation equation.
- Computes the maximal change of the values across all states.
- If the maximal change is greater than the threshold, it keeps updating the values. Otherwise, it terminates the evaluation process and returns the latest values.
Since policy evaluation uses iterative approximation, its result might not be exactly the same as the result of the matrix inversion method, which uses exact computation. In fact, we don't really need the value function to be that precise. Also, it can solve the curses of dimensionality problem, which can result in scaling up the computation to thousands of millions of states. Therefore, we usually prefer policy evaluation over the other.
One more thing to remember is that policy evaluation is used to predict how great a we will get from a given policy; it is not used for control problems.
- 高性能混合信號ARM:ADuC7xxx原理與應用開發
- Hadoop 2.x Administration Cookbook
- 數據中心建設與管理指南
- 數控銑削(加工中心)編程與加工
- 樂高創意機器人教程(中級 下冊 10~16歲) (青少年iCAN+創新創意實踐指導叢書)
- iClone 4.31 3D Animation Beginner's Guide
- Multimedia Programming with Pure Data
- 水晶石精粹:3ds max & ZBrush三維數字靜幀藝術
- Blender Compositing and Post Processing
- 電腦主板現場維修實錄
- 悟透JavaScript
- Windows Server 2003系統安全管理
- Java求職寶典
- Mastercam X5應用技能基本功特訓
- 多媒體技術應用教程