- Hands-On Q-Learning with Python
- Nazia Habib
- 456字
- 2021-06-24 15:13:08
States and actions
When first launched, your agent knows nothing about its environment and takes purely random actions.
As an example, suppose that a hypothetical self-driving car powered by a Q-learning algorithm notices that it's reached a red light, but it doesn't know that it's supposed to stop. It moves one block forward and receives a large penalty.
The car makes note of that penalty in the Q-table. The next time it encounters a red light, it looks at the Q-table when deciding what to do, and because the move-forward action in the state where it is stopped at a red light now has a lower reward value than any other action, it is less likely to decide to run the red light again.
Likewise, when it takes a correct action, such as stopping at a red light or safely moving closer to the destination, it gets a reward. Thus, it remembers that taking that action in that state led to a reward, and it becomes more likely to take that action again next time.
While a self-driving car in the real world will, of course, not be expected to teach itself what red lights mean, the driving problem is a popular learning simulation (and one that we'll be implementing in this book) because it's straightforward and easy to model as a state-action function (also called a finite state machine).The following is a sample finite state machine:

When we model a state-action function for any system, we decide the variables that we want to keep track of, and this lets us determine how many states the system can be in.
For example, a state variable for a vehicle might include information about what intersection the car is located at, whether the traffic light is red or green, and whether there are other cars around. Because we're keeping track of multiple variables, we might represent this as a vector.
The possible actions for a self-driving vehicle agent can be: move forward one block, turn left, turn right, and stop and wait – and these actions are mapped to the appropriate values of the state variable.
Recall that an agent's state-action function is called its policy. A policy can be either simple and straightforward or complex and difficult to enumerate, depending on the problem itself and the number of states and actions.
In the model-free version of Q-learning, it's important to note that we do not learn an agent's policy explicitly. We only update the output values that we see as a result of that policy, which we are mapping to the state-action inputs. This is why we refer to model-free Q-learning as a value-based algorithm as opposed to a policy-based algorithm.
- Java編程全能詞典
- 課課通計算機原理
- Visualforce Development Cookbook(Second Edition)
- 會聲會影X5視頻剪輯高手速成
- AWS:Security Best Practices on AWS
- Learning Apache Cassandra(Second Edition)
- Data Wrangling with Python
- 計算機網絡技術實訓
- WordPress Theme Development Beginner's Guide(Third Edition)
- Implementing Oracle API Platform Cloud Service
- 網中之我:何明升網絡社會論稿
- 網絡存儲·數據備份與還原
- Linux Shell編程從初學到精通
- Drupal高手建站技術手冊
- 計算機硬件技術基礎(第2版)