官术网_书友最值得收藏!

RL algorithm

The steps involved in typical RL algorithm are as follows:

  1. First, the agent interacts with the environment by performing an action
  2. The agent performs an action and moves from one state to another
  3. And then the agent will receive a reward based on the action it performed
  4. Based on the reward, the agent will understand whether the action was good or bad
  5. If the action was good, that is, if the agent received a positive reward, then the agent will prefer performing that action or else the agent will try performing an other action which results in a positive reward. So it is basically a trial and error learning process
主站蜘蛛池模板: 青龙| 乐昌市| 项城市| 临安市| 扶沟县| 泰兴市| 南平市| 陆良县| 襄樊市| 吴旗县| 利川市| 临朐县| 修水县| 宁南县| 商河县| 江油市| 军事| 大安市| 渭南市| 密山市| 寿宁县| 玉山县| 襄城县| 利辛县| 攀枝花市| 塘沽区| 桐梓县| 德庆县| 林口县| 通州区| 阳信县| 康马县| 扬州市| 澄江县| 松滋市| 大石桥市| 莫力| 阿城市| 甘南县| 潞城市| 桐柏县|