- TensorFlow Reinforcement Learning Quick Start Guide
- Kaushik Balakrishnan
- 178字
- 2021-06-24 15:29:07
Defining the Bellman equation
The Bellman equation, named after the great computer scientist and applied mathematician Richard E. Bellman, is an optimality condition associated with dynamic programming. It is widely used in RL to update the policy of an agent.
Let's define the following two quantities:


The first quantity, Ps,s', is the transition probability from state s to the new state s'. The second quantity, Rs,s', is the expected reward the agent receives from state s, taking action a, and moving to the new state s'. Note that we have assumed the MDP property, that is, the transition to state at time t+1 only depends on the state and action at time t. Stated in these terms, the Bellman equation is a recursive relationship, and is given by the following equations respectively for the value function and action-value function:


Note that the Bellman equations represent the value function V at a state, and as functions of the value function at other states; similarly for the action-value function Q.
- 嵌入式系統及其開發應用
- Visualforce Development Cookbook(Second Edition)
- PowerShell 3.0 Advanced Administration Handbook
- Julia 1.0 Programming
- AWS Administration Cookbook
- CompTIA Network+ Certification Guide
- 數據庫系統原理及應用教程(第5版)
- Kubernetes for Serverless Applications
- 悟透JavaScript
- 激光選區熔化3D打印技術
- Dreamweaver CS6精彩網頁制作與網站建設
- 電腦上網輕松入門
- 21天學通Linux嵌入式開發
- 筆記本電腦電路分析與故障診斷
- 網絡安全概論