官术网_书友最值得收藏!

RL algorithm

The steps involved in typical RL algorithm are as follows:

  1. First, the agent interacts with the environment by performing an action
  2. The agent performs an action and moves from one state to another
  3. And then the agent will receive a reward based on the action it performed
  4. Based on the reward, the agent will understand whether the action was good or bad
  5. If the action was good, that is, if the agent received a positive reward, then the agent will prefer performing that action or else the agent will try performing an other action which results in a positive reward. So it is basically a trial and error learning process
主站蜘蛛池模板: 长兴县| 石楼县| 南开区| 临桂县| 界首市| 泊头市| 隆安县| 伊通| 沧州市| 张北县| 阳春市| 台前县| 自贡市| 乌兰察布市| 高清| 乌拉特前旗| 溧水县| 天全县| 双流县| 黄骅市| 横峰县| 华池县| 政和县| 衡南县| 甘洛县| 固阳县| 当涂县| 吉木乃县| 弥勒县| 杭锦后旗| 通江县| 莫力| 外汇| 河南省| 安阳县| 黄大仙区| 汝州市| 定远县| 临猗县| 盘锦市| 汉阴县|