官术网_书友最值得收藏!

States and actions

When first launched, your agent knows nothing about its environment and takes purely random actions.

As an example, suppose that a hypothetical self-driving car powered by a Q-learning algorithm notices that it's reached a red light, but it doesn't know that it's supposed to stop. It moves one block forward and receives a large penalty. 

The car makes note of that penalty in the Q-table. The next time it encounters a red light, it looks at the Q-table when deciding what to do, and because the move-forward action in the state where it is stopped at a red light now has a lower reward value than any other action, it is less likely to decide to run the red light again.

Likewise, when it takes a correct action, such as stopping at a red light or safely moving closer to the destination, it gets a reward. Thus, it remembers that taking that action in that state led to a reward, and it becomes more likely to take that action again next time. 

While a self-driving car in the real world will, of course, not be expected to teach itself what red lights mean, the driving problem is a popular learning simulation (and one that we'll be implementing in this book) because it's straightforward and easy to model as a state-action function (also called a finite state machine).The following is a sample finite state machine:

When we model a state-action function for any system, we decide the variables that we want to keep track of, and this lets us determine how many states the system can be in. 

For example, a state variable for a vehicle might include information about what intersection the car is located at, whether the traffic light is red or green, and whether there are other cars around. Because we're keeping track of multiple variables, we might represent this as a vector.

The possible actions for a self-driving vehicle agent can be: move forward one block, turn left, turn right, and stop and wait – and these actions are mapped to the appropriate values of the state variable. 

Recall that an agent's state-action function is called its policy. A policy can be either simple and straightforward or complex and difficult to enumerate, depending on the problem itself and the number of states and actions.

In the model-free version of Q-learning, it's important to note that we do not learn an agent's policy explicitly. We only update the output values that we see as a result of that policy, which we are mapping to the state-action inputs. This is why we refer to model-free Q-learning as a value-based algorithm as opposed to a policy-based algorithm. 

主站蜘蛛池模板: 墨玉县| 依兰县| 抚顺县| 绥德县| 黑龙江省| 志丹县| 措美县| 梅河口市| 始兴县| 太保市| 莆田市| 郓城县| 麦盖提县| 泰和县| 策勒县| 都安| 诏安县| 五莲县| 体育| 长春市| 手游| 原平市| 章丘市| 临湘市| 山东省| 临桂县| 仲巴县| 六枝特区| 兴业县| 砚山县| 富宁县| 金寨县| 萝北县| 大余县| 紫云| 万载县| 翼城县| 隆昌县| 高青县| 邮箱| 蓬莱市|