官术网_书友最值得收藏!

Questions

  1. What is the difference between a reward and a value?
  2. What is a hyperparameter? Give an example of a hyperparameter other than the ones discussed in this chapter. 
  3. Why will a Q-learning agent not choose the highest Q-valued action for its current state?
  4. Explain one benefit of a decaying gamma.
  5. Describe in one or two sentences the difference between the decision-making strategies of SARSA and Q-learning.
  6. What kind of policy does Q-learning implicitly assume the agent is following?
  7. Under what circumstances will SARSA and Q-learning produce the same results?
主站蜘蛛池模板: 建平县| 陆川县| 阿城市| 呼和浩特市| 宣汉县| 武城县| 太谷县| 阳山县| 纳雍县| 湄潭县| 和静县| 潞城市| 贵溪市| 尖扎县| 六安市| 同仁县| 衡东县| 衡阳市| 英山县| 浪卡子县| 得荣县| 鹤山市| 两当县| 上栗县| 平南县| 兴仁县| 靖安县| 石嘴山市| 常宁市| 临夏县| 龙江县| 万年县| 哈密市| 武清区| 衡东县| 广汉市| 九龙城区| 平安县| 呼伦贝尔市| 德钦县| 江川县|