官术网_书友最值得收藏!

Model-free and model-based training

RL algorithms that do not learn a model of how the environment works are called model-free algorithms. By contrast, if a model of the environment is constructed, then the algorithm is called model-based. In general, if value (V) or action-value (Q) functions are used to evaluate the performance, they are called model-free algorithms as no specific model of the environment is used. On the other hand, if you build a model of how the environment transitions from one state to another or determines how many rewards the agent will receive from the environment via a model, then they are called model-based algorithms. 

In model-free algorithms, as aforementioned, we do not construct a model of the environment. Thus, the agent has to take an action at a state to figure out if it is a good or a bad choice. In model-based RL, an approximate model of the environment is learned; either jointly learned along with the policy, or learned a priori. This model of the environment is used to make decisions, as well as to train the policy. We will learn more about both classes of RL algorithms in later chapters.

主站蜘蛛池模板: 宝山区| 长葛市| 疏勒县| 翁源县| 隆子县| 安庆市| 岳西县| 股票| 湾仔区| 洪雅县| 奇台县| 玉林市| 巴林右旗| 五莲县| 鄯善县| 本溪| 桑日县| 舒兰市| 邹平县| 延长县| 通化县| 盖州市| 布尔津县| 安阳市| 大新县| 淄博市| 贺兰县| 巨鹿县| 滦南县| 穆棱市| 商河县| 肃北| 玉龙| 遂川县| 嘉荫县| 攀枝花市| 富宁县| 包头市| 乐都县| 定远县| 惠来县|