官术网_书友最值得收藏!

Solving the optimization problem

Every time your agent steps through the environment, it will update the Q-table with the rewards it has received. Once your Q-table stops updating and reaches its final state, we will know that your agent has found the optimal path to its destination. It will have solved the MDP represented by its environment.

What this means in practice is that your agent will have found the best actions to take from each state that it has encountered through its exploration of its environment. It will have learned enough about the environment to have found an optimal strategy for navigating a path to the goal. When your Q-table stops updating, we say that it has converged to its final state.

We can be sure that when the Q-table converges, then the agent has found the optimal solution. Q-learning, as we've discussed, is only one learning algorithm that can find a solution to the problem, and there are others that are sometimes more efficient or faster. The reason that we choose to use Q-learning as our introduction to RL is that it is relatively simple, straightforward to learn, and it gives us a good introduction to the types of problems that we'll be facing in this optimization space. 

主站蜘蛛池模板: 孙吴县| 双柏县| 新乡县| 六盘水市| 河北省| 贡山| 梧州市| 台前县| 邳州市| 大邑县| 宜君县| 都江堰市| 鄂伦春自治旗| 信阳市| 阳城县| 苍溪县| 吉隆县| 昔阳县| 东至县| 南澳县| 西乡县| 北碚区| 富源县| 合阳县| 天峨县| 海城市| 渭南市| 遂宁市| 确山县| 池州市| 隆林| 峨眉山市| 阳新县| 重庆市| 平邑县| 黄陵县| 顺义区| 孟州市| 玛纳斯县| 望奎县| 漳平市|