- Hands-On Q-Learning with Python
- Nazia Habib
- 213字
- 2021-06-24 15:13:13
When to choose SARSA over Q-learning
As mentioned earlier, Q-learning and SARSA are very similar algorithms, and in fact, Q-learning is sometimes called SARSA-max. When the agent's policy is simply the greedy one (that is, it chooses the highest-valued action from the next state no matter what), Q-learning and SARSA will produce the same results.
In practice, we will not be using a simple greedy strategy and will instead choose something such as epsilon-greedy, where some of the actions are chosen at random. We will explore this in more depth when we discuss epsilon decay strategies further.
We can, therefore, think of SARSA as a more general version of Q-learning. The algorithms are very similar, and in practice, modifying a Q-learning implementation to SARSA involves nothing more than changing the update method for the Q-values. As we've seen, however, the difference in performance can be profound.
In many problems, SARSA will perform better than Q-learning, especially when there is a good chance that the agent will choose to take a random suboptimal action in the next step, as we explored in the cliff-walking example. In this case, Q-learning's assumption that the agent is following the optimal policy may be far enough from true that SARSA will converge faster and with fewer errors.
- 大數(shù)據(jù)導(dǎo)論:思維、技術(shù)與應(yīng)用
- 課課通計(jì)算機(jī)原理
- Java實(shí)用組件集
- 手把手教你學(xué)AutoCAD 2010
- IoT Penetration Testing Cookbook
- Google App Inventor
- PyTorch深度學(xué)習(xí)實(shí)戰(zhàn)
- Learning C for Arduino
- 網(wǎng)站前臺(tái)設(shè)計(jì)綜合實(shí)訓(xùn)
- Hadoop應(yīng)用開(kāi)發(fā)基礎(chǔ)
- 自動(dòng)化生產(chǎn)線(xiàn)安裝與調(diào)試(三菱FX系列)(第二版)
- 工業(yè)機(jī)器人實(shí)操進(jìn)階手冊(cè)
- Mastering Ansible(Second Edition)
- 筆記本電腦維修之電路分析基礎(chǔ)
- Apache Spark Quick Start Guide