官术网_书友最值得收藏!

Simulating the FrozenLake environment

The optimal policies for the MDPs we have dealt with so far are pretty intuitive. However, it won't be that straightforward in most cases, such as the FrozenLake environment. In this recipe, let's play around with the FrozenLake environment and get ready for upcoming recipes where we will find its optimal policy.

FrozenLake is a typical Gym environment with a discrete state space. It is about moving an agent from the starting location to the goal location in a grid world, and at the same time avoiding traps. The grid is either four by four (https://gym.openai.com/envs/FrozenLake-v0/) or eight by eigh.

t (https://gym.openai.com/envs/FrozenLake8x8-v0/). The grid is made up of the following four types of tiles:

  • S: The starting location
  • G: The goal location, which terminates an episode
  • F: The frozen tile, which is a walkable location
  • H: The hole location, which terminates an episode

There are four actions, obviously: moving left (0), moving down (1), moving right (2), and moving up (3). The reward is +1 if the agent successfully reaches the goal location, and 0 otherwise. Also, the observation space is represented in a 16-dimensional integer array, and there are 4 possible actions (which makes sense). 

What is tricky in this environment is that, as the ice surface is slippery, the agent won't always move in the direction it intends. For example, it may move to the left or to the right when it intends to move down.

主站蜘蛛池模板: 资阳市| 新巴尔虎右旗| 临邑县| 哈尔滨市| 时尚| 许昌县| 东辽县| 昌都县| 滨海县| 察雅县| 绵竹市| 卢湾区| 资阳市| 微博| 余干县| 岑溪市| 读书| 抚顺县| 清流县| 高安市| 怀集县| 太保市| 托里县| 鄂伦春自治旗| 兰溪市| 陆川县| 凤凰县| 镇雄县| 丹棱县| 常宁市| 新龙县| 平和县| 闻喜县| 岐山县| 老河口市| 淮滨县| 泰州市| 新宾| 平和县| 西宁市| 屏东县|