書名： Hands-On Q-Learning with Python
作者名： Nazia Habib
本章字數： 456字
更新時間： 2021-06-24 15:13:08

States and actions

When first launched, your agent knows nothing about its environment and takes purely random actions.

As an example, suppose that a hypothetical self-driving car powered by a Q-learning algorithm notices that it's reached a red light, but it doesn't know that it's supposed to stop. It moves one block forward and receives a large penalty.

The car makes note of that penalty in the Q-table. The next time it encounters a red light, it looks at the Q-table when deciding what to do, and because the move-forward action in the state where it is stopped at a red light now has a lower reward value than any other action, it is less likely to decide to run the red light again.

Likewise, when it takes a correct action, such as stopping at a red light or safely moving closer to the destination, it gets a reward. Thus, it remembers that taking that action in that state led to a reward, and it becomes more likely to take that action again next time.

While a self-driving car in the real world will, of course, not be expected to teach itself what red lights mean, the driving problem is a popular learning simulation (and one that we'll be implementing in this book) because it's straightforward and easy to model as a state-action function (also called a finite state machine).The following is a sample finite state machine:

When we model a state-action function for any system, we decide the variables that we want to keep track of, and this lets us determine how many states the system can be in.

For example, a state variable for a vehicle might include information about what intersection the car is located at, whether the traffic light is red or green, and whether there are other cars around. Because we're keeping track of multiple variables, we might represent this as a vector.

The possible actions for a self-driving vehicle agent can be: move forward one block, turn left, turn right, and stop and wait – and these actions are mapped to the appropriate values of the state variable.

Recall that an agent's state-action function is called its policy. A policy can be either simple and straightforward or complex and difficult to enumerate, depending on the problem itself and the number of states and actions.

In the model-free version of Q-learning, it's important to note that we do not learn an agent's policy explicitly. We only update the output values that we see as a result of that policy, which we are mapping to the state-action inputs. This is why we refer to model-free Q-learning as a value-based algorithm as opposed to a policy-based algorithm.

官术网_书友最值得收藏!

Hands-On Q-Learning with Python

States and actions