官术网_书友最值得收藏!

Rewards

In RL literature, rewards at a time instant t are typically denoted as Rt. Thus, the total rewards earned in an episode is given by R = r1+ r2 + ... + rt, where t is the length of the episode (which can be finite or infinite).

The concept of discounting is used in RL, where a parameter called the discount factor is used, typically represented by γ and 0 ≤ γ ≤ 1 and a power of it multiplies rt. γ = 0, making the agent myopic, and aiming only for the immediate rewards. γ = 1 makes the agent far-sighted to the point that it procrastinates the accomplishment of the final goal. Thus, a value of γ in the 0-1 range (0 and 1 exclusive) is used to ensure that the agent is neither too myopic nor too far-sighted. γ ensures that the agent prioritizes its actions to maximize the total discounted rewards, Rt, from time instant t, which is given by the following: 

Since 0 ≤ γ ≤ 1, the rewards into the distant future are valued much less than the rewards that the agent can earn in the immediate future. This helps the agent to not waste time and to prioritize its actions. In practice, γ = 0.9-0.99 is typically used in most RL problems.

主站蜘蛛池模板: 茌平县| 娱乐| 板桥市| 徐州市| 韶山市| 溧水县| 神木县| 瑞丽市| 连州市| 桐梓县| 鄂州市| 达州市| 浮梁县| 珠海市| 禹州市| 秦安县| 泰州市| 白城市| 成安县| 吉安县| 友谊县| 久治县| 北安市| 温州市| 罗田县| 巴东县| 玉林市| 合川市| 方山县| 临澧县| 香河县| 讷河市| 湖州市| 固安县| 萝北县| 璧山县| 封丘县| 黄冈市| 杭锦后旗| 高唐县| 兰溪市|