- Python Reinforcement Learning
- Sudharsan Ravichandiran Sean Saito Rajalingappaa Shanmugamani Yang Wenzhuo
- 239字
- 2021-06-24 15:17:31
Rewards and returns
As we have learned, in an RL environment, an agent interacts with the environment by performing an action and moves from one state to another. Based on the action it performs, it receives a reward. A reward is nothing but a numerical value, say, +1 for a good action and -1 for a bad action. How do we decide if an action is good or bad? In a maze game, a good action is where the agent makes a move so that it doesn't hit a maze wall, whereas a bad action is where the agent moves and hits the maze wall.
An agent tries to maximize the total amount of rewards (cumulative rewards) it receives from the environment instead of immediate rewards. The total amount of rewards the agent receives from the environment is called returns. So, we can formulate total amount of rewards (returns) received by the agents as follows:

is the reward received by the agent at a time step
while performing an action
to move from one state to another.
is the reward received by the agent at a time
step while performing an action to move from one state to another. Similarly,
is the reward received by the agent at a final time step
while performing an action to move from one state to another.
- 我們都是數據控:用大數據改變商業、生活和思維方式
- 大規模數據分析和建模:基于Spark與R
- Python數據分析與挖掘實戰
- Microsoft SQL Server企業級平臺管理實踐
- SQL Server 2012數據庫技術與應用(微課版)
- Voice Application Development for Android
- 軟件成本度量國家標準實施指南:理論、方法與實踐
- 大話Oracle Grid:云時代的RAC
- 智能數據分析:入門、實戰與平臺構建
- 數據庫技術實用教程
- INSTANT Apple iBooks How-to
- 區域云計算和大數據產業發展:浙江樣板
- 活用數據:驅動業務的數據分析實戰
- Access數據庫開發從入門到精通
- SQL Server 2008寶典(第2版)