官术网_书友最值得收藏!

Defining the Bellman equation

The Bellman equation, named after the great computer scientist and applied mathematician Richard E. Bellman, is an optimality condition associated with dynamic programming. It is widely used in RL to update the policy of an agent.

Let's define the following two quantities: 

The first quantity, Ps,s', is the transition probability from state s to the new state s'. The second quantity, Rs,s'is the expected reward the agent receives from state s, taking action a, and moving to the new state s'. Note that we have assumed the MDP property, that is, the transition to state at time t+1 only depends on the state and action at time t. Stated in these terms, the Bellman equation is a recursive relationship, and is given by the following equations respectively for the value function and action-value function:

Note that the Bellman equations represent the value function V at a state, and as functions of the value function at other states; similarly for the action-value function Q.

主站蜘蛛池模板: 讷河市| 大荔县| 苍山县| 祥云县| 宜州市| 永清县| 肇庆市| 大方县| 中方县| 油尖旺区| 广饶县| 平阴县| 韶关市| 咸丰县| 铜川市| 建始县| 漠河县| 黑山县| 依安县| 乡宁县| 崇礼县| 青岛市| 漾濞| 嘉黎县| 建昌县| 昭平县| 新余市| 辰溪县| 长兴县| 铁岭县| 托克逊县| 枝江市| 安达市| 屏南县| 府谷县| 龙南县| 韩城市| 山阴县| 于都县| 镇江市| 溧阳市|