書名： Hands-On Q-Learning with Python
作者名： Nazia Habib
本章字?jǐn)?shù)： 139字
更新時間： 2021-06-24 15:13:09

States, actions, and rewards

What does it mean to be in a state, to take an action, or to receive a reward? These are the most important concepts for us to understand intuitively, so let's dig deeper into them. The following diagram depicts the agent-environment interaction in an MDP:

The agent interacts with the environment through actions, and it receives rewards and state information from the environment. In other words, the states and rewards are feedback from the environment, and the actions are inputs to the environment from the agent.

Going back to our simple driving simulator example, our agent might be moving or stopped at a red light, turning left or right, or heading straight. There might be other cars in the intersection, or there might not be. Our distance from the destination will be X units.

官术网_书友最值得收藏!

Hands-On Q-Learning with Python

States, actions, and rewards