官术网_书友最值得收藏!

Markov Decision Process

MDP is an extension of the Markov chain. It provides a mathematical framework for modeling decision-making situations. Almost all Reinforcement Learning problems can be modeled as MDP.

MDP is represented by five important elements: 

  • A set of states  the agent can actually be in.
  • A set of actions that can be performed by an agent, for moving from one state to another.
  • A transition probability (), which is the probability of moving from one state  to another state by performing some action .
  • A reward probability (), which is the probability of a reward acquired by the agent for moving from one state to another state  by performing some action .
  • A discount factor (), which controls the importance of immediate and future rewards. We will discuss this in detail in the upcoming sections.
主站蜘蛛池模板: 阜南县| 崇信县| 治县。| 辽宁省| 彭州市| 宜城市| 喀什市| 珠海市| 达尔| 银川市| 怀安县| 北安市| 威海市| 海原县| 潜江市| 惠东县| 兴文县| 三门峡市| 潍坊市| 上虞市| 宜良县| 仁寿县| 德安县| 澄城县| 遂平县| 湟中县| 安新县| 油尖旺区| 林周县| 霍邱县| 嘉兴市| 嵊州市| 东阿县| 民县| 柳江县| 宜昌市| 马尔康县| 嘉义县| 隆子县| 庆云县| 桑日县|