Questions

What is the difference between a reward and a value?
What is a hyperparameter? Give an example of a hyperparameter other than the ones discussed in this chapter.
Why will a Q-learning agent not choose the highest Q-valued action for its current state?
Explain one benefit of a decaying gamma.
Describe in one or two sentences the difference between the decision-making strategies of SARSA and Q-learning.
What kind of policy does Q-learning implicitly assume the agent is following?
Under what circumstances will SARSA and Q-learning produce the same results?

官术网_书友最值得收藏!