- Hands-On Q-Learning with Python
- Nazia Habib
- 91字
- 2021-06-24 15:13:13
Questions
- What is the difference between a reward and a value?
- What is a hyperparameter? Give an example of a hyperparameter other than the ones discussed in this chapter.
- Why will a Q-learning agent not choose the highest Q-valued action for its current state?
- Explain one benefit of a decaying gamma.
- Describe in one or two sentences the difference between the decision-making strategies of SARSA and Q-learning.
- What kind of policy does Q-learning implicitly assume the agent is following?
- Under what circumstances will SARSA and Q-learning produce the same results?
推薦閱讀
- 機器學習實戰:基于Sophon平臺的機器學習理論與實踐
- 80x86/Pentium微型計算機原理及應用
- 大數據處理平臺
- 21天學通Java
- C語言開發技術詳解
- Blender Compositing and Post Processing
- 觸控顯示技術
- INSTANT Autodesk Revit 2013 Customization with .NET How-to
- Kubernetes for Serverless Applications
- Nginx高性能Web服務器詳解
- Google SketchUp for Game Design:Beginner's Guide
- Applied Data Visualization with R and ggplot2
- Mastering Ceph
- 基于ARM9的小型機器人制作
- FANUC工業機器人配置與編程技術