官术网_书友最值得收藏!

Gamma – current versus future rewards

Let's discuss the concept of current rewards versus future rewards. Your agent's discount rate gamma has a value between zero and one, and its function is to discount future rewards against immediate rewards.

Your agent is deciding what action to take based not only on the reward it expects to get for taking that action, but on the future rewards it might be able to get from the state it will be in after taking that action.

One easy way to illustrate discounting rewards is with the following example of a mouse in a maze collecting cheese as rewards and avoiding cats and traps (that is, electric shocks):

The rewards that are closest to the cats, even though their point values are higher (three versus one), should be discounted if we want to maximize how long the mouse agent lives and how much cheese it can collect. These rewards come with a higher risk of the mouse being killed, so we lower their value accordingly. In other words, collecting the closest cheese should be given a higher priority when the mouse decides what actions to take.

When we discount a future reward, we make it less valuable than an immediate reward (similar to how we take into account the time value of money when making a loan and treat a dollar received today as more valuable than a dollar received a year from now).

The value of gamma that we choose varies according to how highly we value future rewards:

  • If we choose a value of zero for gamma, the agent will not care about future rewards at all and will only take current rewards into account
  • Choosing a value of one for gamma will make the agent consider future rewards as highly as current rewards
主站蜘蛛池模板: 平潭县| 阿克| 沅陵县| 汽车| 古交市| 和龙市| 新民市| 宁城县| 诸暨市| 邹城市| 巨鹿县| 应用必备| 大埔县| 五大连池市| 六枝特区| 大新县| 石阡县| 五河县| 温宿县| 商丘市| 辰溪县| 德江县| 华阴市| 漳平市| 平南县| 昌都县| 肥乡县| 阿拉善右旗| 南宫市| 辽中县| 滦南县| 丰县| 万荣县| 仙桃市| 新龙县| 桂阳县| 喀喇沁旗| 卢氏县| 浦北县| 滨州市| 五原县|