官术网_书友最值得收藏!

Getting Started with the Q-Learning Algorithm

Q-learning is an algorithm that is designed to solve a control problem called a Markov decision process (MDP). We will go over what MDPs are in detail, how they work, and how Q-learning is designed to solve them. We will explore some classic reinforcement learning (RL) problems and learn how to develop solutions using Q-learning.

We will cover the following topics in this chapter:

  • Understanding what an MDP is and how Q-learning is designed to solve an MDP
  • Learning how to define the states an agent can be in, and the actions it can take from those states in the context of the OpenAI Gym Taxi-v2 environment that we will be using for our first project
  • Becoming familiar with alpha (learning), gamma (discount), and epsilon (exploration) rates
  • Diving into a classic RL problem, the multi-armed bandit problem (MABP), and putting it into a Q-learning context
主站蜘蛛池模板: 贞丰县| 澄江县| 深水埗区| 固始县| 台州市| 沙河市| 伊宁县| 定襄县| 喀喇| 惠东县| 岚皋县| 安庆市| 都匀市| 嘉鱼县| 甘泉县| 邵阳市| 庄河市| 彩票| 柘城县| 辛集市| 三台县| 长岛县| 会昌县| 日土县| 青冈县| 山西省| 麻阳| 永顺县| 青河县| 麻阳| 石楼县| 安顺市| 文安县| 永定县| 西昌市| 吉木萨尔县| 察哈| 竹北市| 罗城| 炎陵县| 沈丘县|