- Hands-On Q-Learning with Python
- Nazia Habib
- 140字
- 2021-06-24 15:13:12
SARSA versus Q-learning – on-policy or off?
Similar to Q-learning, SARSA is a model-free RL method that does not explicitly learn the agent's policy function.
The primary difference between SARSA and Q-learning is that SARSA is an on-policy method while Q-learning is an off-policy method. The effective difference between the two algorithms happens in the step where the Q-table is updated. Let's discuss what that means with some examples:

Monte Carlo tree search (MCTS) is a type of model-based RL. We won't be discussing it in detail here, but it's useful to explore further as a contrast to model-free RL algorithms. Briefly, in model-based RL, we attempt to explicitly model a value function instead of relying on sampling and observation, so that we don't have to rely as much on trial and error in the learning process.
- 智能傳感器技術與應用
- 程序設計語言與編譯
- 機器自動化控制器原理與應用
- 自動檢測與轉換技術
- Python Algorithmic Trading Cookbook
- 智能工業報警系統
- Hands-On Cybersecurity with Blockchain
- Apache Spark Deep Learning Cookbook
- 信息物理系統(CPS)測試與評價技術
- Storm應用實踐:實時事務處理之策略
- Enterprise PowerShell Scripting Bootcamp
- Mastering Geospatial Analysis with Python
- Mastering Ceph
- EJB JPA數據庫持久層開發實踐詳解
- 工業機器人應用系統三維建模