- TensorFlow Reinforcement Learning Quick Start Guide
- Kaushik Balakrishnan
- 214字
- 2021-06-24 15:29:07
Learning the Markov decision process
The Markov property is widely used in RL, and it states that the environment's response at time t+1 depends only on the state and action at time t. In other words, the immediate future only depends on the present and not on the past. This is a useful property that simplifies the math considerably, and is widely used in many fields such as RL and robotics.
Consider a system that transitions from state s0 to s1 by taking an action a0 and receiving a reward r1, and thereafter from s1 to s2 taking action a1, and so on until time t. If the probability of being in a state s' at time t+1 can be represented mathematically as in the following function, then the system is said to follow the Markov property:

Note that the probability of being in state st+1 depends only on st and at and not on the past. An environment that satisfies the following state transition property and reward function as follows is said to be a Markov Decision Process (MDP):


Let's now define the very foundation of RL: the Bellman equation. This equation will help in providing an iterative solution to obtaining value functions.
- 協(xié)作機器人技術及應用
- Hands-On Data Science with SQL Server 2017
- 機器自動化控制器原理與應用
- 現(xiàn)代機械運動控制技術
- 大學計算機應用基礎
- MATLAB/Simulink權威指南:開發(fā)環(huán)境、程序設計、系統(tǒng)仿真與案例實戰(zhàn)
- 學會VBA,菜鳥也高飛!
- Practical Big Data Analytics
- Photoshop CS5圖像處理入門、進階與提高
- 會聲會影X4中文版從入門到精通
- Ansible 2 Cloud Automation Cookbook
- Cortex-M3嵌入式處理器原理與應用
- 基于Proteus的PIC單片機C語言程序設計與仿真
- Hands-On DevOps
- Deep Learning Essentials