官术网_书友最值得收藏!

On-policy method

On-policy methods use the same policy to evaluate as was used to make the decisions on actions. On-policy algorithms generally do not have a replay buffer; the experience encountered is used to train the model in situ. The same policy that was used to move the agent from state at time t to state at time t+1, is used to evaluate if the performance was good or bad. For example, if a robot is exploring the world at a given state, if it uses its current policy to ascertain whether the actions it took in the current state were good or bad, then it is an on-policy algorithm, as the current policy is also used to evaluate its actions. SARSA, A3C, TRPO, and PPO are on-policy algorithms that we will be covering in this book.

主站蜘蛛池模板: 萨嘎县| 西昌市| 韶山市| 包头市| 精河县| 邵武市| 嵩明县| 洪江市| 灌南县| 驻马店市| 江源县| 泊头市| 咸丰县| 昭通市| 潞城市| 社会| 永春县| 衡东县| 井冈山市| 深水埗区| 丹江口市| 道孚县| 龙里县| 洛扎县| 永平县| 谷城县| 嘉义市| 平遥县| 纳雍县| 恩施市| 东平县| 专栏| 富蕴县| 平果县| 南郑县| 商水县| 买车| 龙游县| 东兰县| 雷州市| 共和县|