官术网_书友最值得收藏!

Markov Decision Processes and Dynamic Programming

In this chapter, we will continue our practical reinforcement learning journey with PyTorch by looking at Markov decision processes (MDPs) and dynamic programming. This chapter will start with the creation of a Markov chain and an MDP, which is the core of most reinforcement learning algorithms. You will also become more familiar with Bellman equations by practicing policy evaluation. We will then move on and apply two approaches to solving an MDP: value iteration and policy iteration. We will use the FrozenLake environment as an example. At the end of the chapter, we will demonstrate how to solve the interesting coin-flipping gamble problem with dynamic programming step by step.

The following recipes will be covered in this chapter:

  • Creating a Markov chain
  • Creating an MDP
  • Performing policy evaluation
  • Simulating the FrozenLake environment
  • Solving an MDP with a value iteration algorithm
  • Solving an MDP with a policy iteration algorithm
  • Solving the coin-flipping gamble problem
主站蜘蛛池模板: 萨迦县| 罗江县| 平顺县| 绥滨县| 青岛市| 阿巴嘎旗| 临洮县| 太仆寺旗| 五常市| 龙海市| 垫江县| 桂林市| 康定县| 隆林| 交口县| 龙州县| 高清| 阜城县| 正宁县| 唐河县| 朝阳区| 华亭县| 积石山| 枣强县| 惠州市| 甘南县| 枣阳市| 钦州市| 通辽市| 新民市| 南川市| 连平县| 宁城县| 黄山市| 安溪县| 洪江市| 元氏县| 巫山县| 伊宁市| 洛宁县| 榕江县|