書名： TensorFlow Reinforcement Learning Quick Start Guide
作者名： Kaushik Balakrishnan
本章字數： 138字
更新時間： 2021-06-24 15:29:07

On-policy method

On-policy methods use the same policy to evaluate as was used to make the decisions on actions. On-policy algorithms generally do not have a replay buffer; the experience encountered is used to train the model in situ. The same policy that was used to move the agent from state at time t to state at time t+1, is used to evaluate if the performance was good or bad. For example, if a robot is exploring the world at a given state, if it uses its current policy to ascertain whether the actions it took in the current state were good or bad, then it is an on-policy algorithm, as the current policy is also used to evaluate its actions. SARSA, A3C, TRPO, and PPO are on-policy algorithms that we will be covering in this book.

官术网_书友最值得收藏!

TensorFlow Reinforcement Learning Quick Start Guide

On-policy method