官术网_书友最值得收藏!

Discounted cumulative reward

In the previous section, we said that the goal of reinforcement learning is to learn a policy that, for each state s in which the system is located, indicates to the agent an action to maximize the total reward received during the entire action sequence. How can we maximize the total reinforcement received during the entire sequence of actions?

The total reinforcement derived from the policy is calculated as follows:

Here, rT represents the reward of the action that drives the environment in the terminal state sT.

A possible solution to the problem is to associate the action that provides the highest reward to each individual state; that is, we must determine an optimal policy such that the previous quantity is maximized.

For problems that do not reach the goal or terminal state in a finite number of steps (continuing tasks), Rt tends to infinity.

In these cases, the sum of the rewards that one wants to maximize diverges at the infinite, so this approach is not applicable. Then, it is necessary to develop an alternative reinforcement technique.

The technique that best suits the reinforcement learning paradigm turns out to be the discounted cumulative reward, which tries to maximize the following quantity:

Here, γ is called a discount factor and represents the importance for future rewards. This parameter can take the values 0 ≤ γ ≤ 1, with the following value:

  • If γ <1, the sequence rt will converge to a finite value
  • If γ = 0, the agent will have no interest in future rewards, but will try to maximize the reward only for the current state
  • If γ = 1, the agent will try to increase future rewards even at the expense of the immediate ones

The discount factor can be modified during the learning process to highlight particular actions or states. An optimal policy can lead to the reinforcement obtained in performing a single action to be low (or even negative), provided that this leads to greater reinforcement overall.

主站蜘蛛池模板: 尼木县| 岳阳县| 麻阳| 大安市| 手游| 石景山区| 高州市| 乌兰浩特市| 娱乐| 长武县| 内丘县| 自贡市| 亚东县| 灌云县| 昆明市| 乐亭县| 徐州市| 陕西省| 岳普湖县| 光泽县| 襄城县| 蒙山县| 南澳县| 蒲江县| 柳州市| 永川市| 德格县| 武夷山市| 常熟市| 镇赉县| 博乐市| 镇巴县| 眉山市| 余姚市| 巴林左旗| 湖北省| 岑巩县| 双牌县| 永胜县| 安福县| 淳化县|