官术网_书友最值得收藏!

Identifying reward functions and the concept of discounted rewards

Rewards in RL are no different from real world rewards – we all receive good rewards for doing well, and bad rewards (aka penalties) for inferior performance. Reward functions are provided by the environment to guide an agent to learn as it explores the environment. Specifically, it is a measure of how well the agent is performing.

The reward function defines what the good and bad things are that can happen to the agent. For instance, a mobile robot that reaches its goal is rewarded, but is penalized for crashing into obstacles. Likewise, an industrial robot arm is rewarded for putting a peg into a hole, but is penalized for being in undesired poses that can be catastrophic by causing ruptures or crashes. Reward functions are the signal to the agent regarding what is optimum and what isn't. The agent's long-term goal is to maximize rewards and minimize penalties.

主站蜘蛛池模板: 隆安县| 剑河县| 扬州市| 汝南县| 措勤县| 延川县| 昆山市| 阳谷县| 苏尼特右旗| 固阳县| 龙州县| 庆城县| 灵宝市| 扶绥县| 巴东县| 闻喜县| 大洼县| 福海县| 晋宁县| 滨海县| 丹阳市| 永顺县| 天津市| 巴楚县| 垫江县| 奉化市| 平顺县| 泾川县| 石楼县| 邹平县| 阿合奇县| 宁明县| 普洱| 永新县| 乌鲁木齐市| 荥阳市| 定陶县| 和平县| 大埔区| 财经| 中山市|