- Hands-On Q-Learning with Python
- Nazia Habib
- 150字
- 2021-06-24 15:13:16
Decaying epsilon
We've discussed epsilon decay in the context of exploration versus exploitation. The more we get to know our environment, the less random exploration we want to do and the more actions we want to take that we know will give us high rewards. Our goal should always be to take advantage of what we already know.
We do this by reducing the agent's epsilon value by a particular amount as the game progresses. Remember that epsilon is the likelihood (in percentage) that the agent will take a random action, instead of taking the current highest Q-valued action for the current state.
When we reduce epsilon, the likelihood of a random action becomes smaller, and we take more opportunities to benefit from the high-valued actions that we have already discovered.
For similar reasons, it can be to our benefit to decay alpha and gamma along with epsilon.