官术网_书友最值得收藏!

Basic terminologies and conventions

The following are the basic terminologies associated with reinforcement learning:

  • Agent: This we create by programming such that it is able to sense the environment, perform actions, receive feedback, and try to maximize rewards.
  • Environment: The world where the agent resides. It can be real or simulated.
  • State: The perception or configuration of the environment that the agent senses. State spaces can be finite or infinite.
  • Rewards: Feedback the agent receives after any action it has taken. The goal of the agent is to maximize the overall reward, that is, the immediate and the future reward. Rewards are defined in advance. Therefore, they must be created properly to achieve the goal efficiently.
  • Actions: Anything that the agent is capable of doing in the given environment. Action space can be finite or infinite.
  • SAR triple: (state, action, reward) is referred as the SAR triple, represented as (s, a, r).
  • Episode: Represents one complete run of the whole task.

Let's deduce the convention shown in the following diagram:

Every task is a sequence of SAR triples. We start from state S(t), perform action A(t) and thereby, receive a reward R(t+1), and land on a new state S(t+1). The current state and action pair gives rewards for the next step. Since, S(t) and A(t) results in S(t+1), we have a new triple of (current state, action, new state), that is, [S(t),A(t),S(t+1)] or (s,a,s').

主站蜘蛛池模板: 永城市| 榕江县| 绥棱县| 霸州市| 龙游县| 南平市| 鹿泉市| 墨竹工卡县| 濮阳县| 田林县| 城市| 中阳县| 高雄县| 柳林县| 稷山县| 普陀区| 城市| 邵武市| 广西| 文化| 洛隆县| 石嘴山市| 黎城县| 大埔县| 泰来县| 萨嘎县| 安顺市| 定边县| 大理市| 勃利县| 神农架林区| 信宜市| 安康市| 泰宁县| 台中市| 阿鲁科尔沁旗| 长宁区| 苏尼特左旗| 保德县| 南乐县| 松原市|