官术网_书友最值得收藏!

Markov Decision Process

The Markov decision process, better known as MDP, is an approach in reinforcement learning to take decisions in a gridworld environment. A gridworld environment consists of states in the form of grids, such as the one in the FrozenLake-v0 environment from OpenAI gym, which we tried to examine and solve in the last chapter.

The MDP tries to capture a world in the form of a grid by dividing it into states, actions, models/transition models, and rewards. The solution to an MDP is called a policy and the objective is to find the optimal policy for that MDP task.

Thus, any reinforcement learning task composed of a set of states, actions, and rewards that follows the Markov property would be considered an MDP.

In this chapter, we will dig deep into MDPs, states, actions, rewards, policies, and how to solve them using Bellman equations. Moreover, we will cover the basics of Partially Observable MDP and their complexity in solving. We will also cover the exploration-exploitation dilemma and the famous E3 (explicit, explore, or exploit) algorithm. Then we will come to the fascinating part, where we will program an agent to learn and play pong using the principles of MDP.

We will cover the following topics in this chapter:

  • Markov decision processes
  • Partially observable Markov decision processes
  • Training the FrozenLake-v0 environment using MDP
主站蜘蛛池模板: 威信县| 特克斯县| 南京市| 平潭县| 宁德市| 竹溪县| 英山县| 上饶市| 镇赉县| 平谷区| 资中县| 依兰县| 新和县| 安徽省| 澎湖县| 陆丰市| 定兴县| 嘉禾县| 鹿泉市| 开平市| 隆昌县| 辽阳县| 隆德县| 江西省| 东兰县| 阿瓦提县| 高尔夫| 海门市| 郧西县| 安徽省| 阳城县| 红桥区| 内丘县| 乌拉特中旗| 锦州市| 漳平市| 东丰县| 武邑县| 阿荣旗| 南郑县| 西乡县|