官术网_书友最值得收藏!

States

Whatever we need to know about our environment is stored as part of our state, which can be represented as a vector of the variables that we care about:

  • The location (x and y coordinates)
  • The direction 
  • The color of light (red or green)
  • The other cars present (for example, one binary flag for each spot a car might be in)
  • The distance from the destination

The following screenshot is from the game Pac-Man:

Taking Pac-Man as another example, we can use a state vector to represent the variables that we want to keep track of—such as the location of the dots left in the maze, where the Pac-Man character currently is and what direction it is moving in, the location and direction of each ghost, and whether the ghosts can be eaten or not.

We can represent any variables in our state vector that we think are important to our knowledge of the game. At any point in time, our state vector should represent for us the things that we want to know about our environment.

Ideally, we should be able to look at our state vector and have all the information we need to optimally determine what action we need to take. A well-designed state space is key to an effective RL solution.

However, we can quickly see that the number of states in an environment depends on the variables that we choose to keep track of. In other words, it is arbitrary to some respect. Not all algorithm designers will represent the same environment using the same state space. One thing we notice (as developers and researchers) is that even a small change in the way state spaces are represented in an environment can cause a huge difference in the difficulty level of a problem.

When we use a standardized packaged environment such as the ones we'll be working with in OpenAI Gym, the state space (also called an observation space) will be determined for us. We'll also have a predetermined action space and reward structure. 

One good reason to use a standardized environment such as the one offered by OpenAI Gym is that it allows you to compare the performance of your RL algorithms to the work of others. Having a level playing field for the state space allows us to meaningfully compare RL algorithms to each other in a way we otherwise could not. 

主站蜘蛛池模板: 渭南市| 洛南县| 贵阳市| 盘锦市| 府谷县| 宜兴市| 韩城市| 农安县| 梓潼县| 宽甸| 兰考县| 白玉县| 新平| 四平市| 荣成市| 隆子县| 平遥县| 杨浦区| 龙州县| 龙陵县| 夹江县| 东兴市| 叙永县| 潮安县| 张家界市| 乌鲁木齐县| 湄潭县| 增城市| 睢宁县| 苗栗县| 五原县| 禄劝| 泸州市| 扶余县| 城固县| 怀宁县| 布尔津县| 陆良县| 江山市| 西贡区| 梅河口市|