書名： TensorFlow Reinforcement Learning Quick Start Guide
作者名： Kaushik Balakrishnan
本章字數： 196字
更新時間： 2021-06-24 15:29:08

Model-free and model-based training

RL algorithms that do not learn a model of how the environment works are called model-free algorithms. By contrast, if a model of the environment is constructed, then the algorithm is called model-based. In general, if value (V) or action-value (Q) functions are used to evaluate the performance, they are called model-free algorithms as no specific model of the environment is used. On the other hand, if you build a model of how the environment transitions from one state to another or determines how many rewards the agent will receive from the environment via a model, then they are called model-based algorithms.

In model-free algorithms, as aforementioned, we do not construct a model of the environment. Thus, the agent has to take an action at a state to figure out if it is a good or a bad choice. In model-based RL, an approximate model of the environment is learned; either jointly learned along with the policy, or learned a priori. This model of the environment is used to make decisions, as well as to train the policy. We will learn more about both classes of RL algorithms in later chapters.

官术网_书友最值得收藏!

TensorFlow Reinforcement Learning Quick Start Guide

Model-free and model-based training