官术网_书友最值得收藏!

Rewards and returns

As we have learned, in an RL environment, an agent interacts with the environment by performing an action and moves from one state to another. Based on the action it performs, it receives a reward. A reward is nothing but a numerical value, say, +1 for a good action and -1 for a bad action. How do we decide if an action is good or bad? In a maze game, a good action is where the agent makes a move so that it doesn't hit a maze wall, whereas a bad action is where the agent moves and hits the maze wall. 

An agent tries to maximize the total amount of rewards (cumulative rewards) it receives from the environment instead of immediate rewards. The total amount of rewards the agent receives from the environment is called returns. So, we can formulate total amount of rewards (returns) received by the agents as follows:

 

 is the reward received by the agent at a time step  while performing an action
to move from one state to another.  is the reward received by the agent at a time
step  while performing an action to move from one state to another. Similarly,  is the reward received by the agent at a final time step  while performing an action to move from one state to another.

主站蜘蛛池模板: 邢台市| 武宁县| 津市市| 托克逊县| 健康| 兴文县| 遂平县| 保亭| 汤阴县| 喜德县| 金山区| 雷波县| 定西市| 博野县| 荣成市| 阳高县| 永胜县| 平安县| 壶关县| 浏阳市| 竹北市| 崇礼县| 宿松县| 西宁市| 荃湾区| 河东区| 阿拉善左旗| 枝江市| 库尔勒市| 朝阳市| 文水县| 盱眙县| 永清县| 瑞丽市| 湘乡市| 古浪县| 兴国县| 宁河县| 凤冈县| 长武县| 揭东县|