書名： Hands-On Q-Learning with Python
作者名： Nazia Habib
本章字數： 140字
更新時間： 2021-06-24 15:13:12

SARSA versus Q-learning – on-policy or off?

Similar to Q-learning, SARSA is a model-free RL method that does not explicitly learn the agent's policy function.

The primary difference between SARSA and Q-learning is that SARSA is an on-policy method while Q-learning is an off-policy method. The effective difference between the two algorithms happens in the step where the Q-table is updated. Let's discuss what that means with some examples:

Monte Carlo tree search (MCTS) is a type of model-based RL. We won't be discussing it in detail here, but it's useful to explore further as a contrast to model-free RL algorithms. Briefly, in model-based RL, we attempt to explicitly model a value function instead of relying on sampling and observation, so that we don't have to rely as much on trial and error in the learning process.

官术网_书友最值得收藏!

Hands-On Q-Learning with Python

SARSA versus Q-learning – on-policy or off?