書名： Hands-On Q-Learning with Python
作者名： Nazia Habib
本章字數： 212字
更新時間： 2021-06-24 15:13:16

Solving the optimization problem

Every time your agent steps through the environment, it will update the Q-table with the rewards it has received. Once your Q-table stops updating and reaches its final state, we will know that your agent has found the optimal path to its destination. It will have solved the MDP represented by its environment.

What this means in practice is that your agent will have found the best actions to take from each state that it has encountered through its exploration of its environment. It will have learned enough about the environment to have found an optimal strategy for navigating a path to the goal. When your Q-table stops updating, we say that it has converged to its final state.

We can be sure that when the Q-table converges, then the agent has found the optimal solution. Q-learning, as we've discussed, is only one learning algorithm that can find a solution to the problem, and there are others that are sometimes more efficient or faster. The reason that we choose to use Q-learning as our introduction to RL is that it is relatively simple, straightforward to learn, and it gives us a good introduction to the types of problems that we'll be facing in this optimization space.

官术网_书友最值得收藏!

Hands-On Q-Learning with Python

Solving the optimization problem