官术网_书友最值得收藏!

Value function

The second component an agent can have is called the value function. As mentioned previously, it is useful to assess your position, good or bad, in a given state. In a game of chess, a player would like to know the likelihood that they are going to win in a board state. An agent navigating a maze would like to know how close it is to the destination. The value function serves this purpose; it predicts the expected future reward an agent would receive in a given state. In other words, it measures whether a given state is desirable for the agent. More formally, the value function takes a state and a policy as input and returns a scalar value representing the expected cumulative reward:

Take our maze example, and suppose the agent receives a reward of -1 for every step it takes. The agent's goal is to finish the maze in the smallest number of steps possible. The value of each state can be represented as follows:

Figure 3: A maze where each square indicates the value of being in the state

Each square basically represents the number of steps it takes to get to the end of the maze. As you can see, the smallest number of steps required to reach the goal is 15.

How can the value function help an agent perform a task well, other than informing us of how desirable a given state is? As we will see in the following sections, value functions play an integral role in predicting how well a sequence of actions will do even before the agent performs them. This is similar to chess players imagining how well a sequence of future actions will do in improving his or her  chances of winning. To do this, the agent also needs to have an understanding of how the environment works. This is where the third component of an agent, the model, becomes relevant.

主站蜘蛛池模板: 金昌市| 关岭| 克拉玛依市| 贺州市| 岑溪市| 呼玛县| 迭部县| 辰溪县| 泗洪县| 襄樊市| 昌平区| 平江县| 鹤庆县| 嘉善县| 长寿区| 普洱| 葫芦岛市| 瓮安县| 定兴县| 章丘市| 长顺县| 海晏县| 海淀区| 开化县| 怀来县| 青龙| 牡丹江市| 婺源县| 津市市| 淅川县| 扶沟县| 东宁县| 正蓝旗| 容城县| 榆林市| 康乐县| 左云县| 贵阳市| 翁源县| 同心县| 卫辉市|