官术网_书友最值得收藏!

Markov Decision Process

The Markov decision process, better known as MDP, is an approach in reinforcement learning to take decisions in a gridworld environment. A gridworld environment consists of states in the form of grids, such as the one in the FrozenLake-v0 environment from OpenAI gym, which we tried to examine and solve in the last chapter.

The MDP tries to capture a world in the form of a grid by dividing it into states, actions, models/transition models, and rewards. The solution to an MDP is called a policy and the objective is to find the optimal policy for that MDP task.

Thus, any reinforcement learning task composed of a set of states, actions, and rewards that follows the Markov property would be considered an MDP.

In this chapter, we will dig deep into MDPs, states, actions, rewards, policies, and how to solve them using Bellman equations. Moreover, we will cover the basics of Partially Observable MDP and their complexity in solving. We will also cover the exploration-exploitation dilemma and the famous E3 (explicit, explore, or exploit) algorithm. Then we will come to the fascinating part, where we will program an agent to learn and play pong using the principles of MDP.

We will cover the following topics in this chapter:

  • Markov decision processes
  • Partially observable Markov decision processes
  • Training the FrozenLake-v0 environment using MDP
主站蜘蛛池模板: 喜德县| 武义县| 锡林郭勒盟| 乃东县| 綦江县| 武清区| 井研县| 富裕县| 南京市| 德格县| 柯坪县| 西乌| 清苑县| 襄城县| 哈尔滨市| 钟山县| 平南县| 峨眉山市| 康定县| 交城县| 叶城县| 凤翔县| 茶陵县| 始兴县| 凤山县| 五峰| 新乐市| 平阴县| 广河县| 乌拉特前旗| 西乌| 垣曲县| 盖州市| 邹平县| 乐业县| 措美县| 鄂托克旗| 安阳县| 上犹县| 乌兰察布市| 清远市|