官术网_书友最值得收藏!

SARSA versus Q-learning – on-policy or off?

Similar to Q-learning, SARSA is a model-free RL method that does not explicitly learn the agent's policy function. 

The primary difference between SARSA and Q-learning is that SARSA is an on-policy method while Q-learning is an off-policy method. The effective difference between the two algorithms happens in the step where the Q-table is updated. Let's discuss what that means with some examples:

Monte Carlo tree search (MCTS) is a type of model-based RL. We won't be discussing it in detail here, but it's useful to explore further as a contrast to model-free RL algorithms. Briefly, in model-based RL, we attempt to explicitly model a value function instead of relying on sampling and observation, so that we don't have to rely as much on trial and error in the learning process.

主站蜘蛛池模板: 邯郸市| 泸水县| 抚州市| 巢湖市| 马关县| 土默特左旗| 商南县| 新绛县| 高安市| 德阳市| 锡林浩特市| 富源县| 栾城县| 高邑县| 前郭尔| 子长县| 嘉定区| 台安县| 藁城市| 紫阳县| 壶关县| 东平县| 宁明县| 荔浦县| 烟台市| 资溪县| 固阳县| 叶城县| 孝昌县| 丰城市| 吉木萨尔县| 清远市| 万山特区| 临清市| 磴口县| 奇台县| 隆回县| 花莲县| 平顶山市| 高密市| 汉中市|