官术网_书友最值得收藏!

Learning SARSA 

SARSA is another on-policy algorithm that was very popular, particularly in the 1990s. It is an extension of TD-learning, which we saw previously, and is an on-policy algorithm. SARSA keeps an update of the state-action value function, and as new experiences are encountered, this state-action value function is updated using the Bellman equation of dynamic programming. We extend the preceding TD algorithm to state-action value function, Q(st,at), and this approach is called SARSA: 

Here, from a given state st, we take action at, receive a reward rt+1, transition to a new state st+1, and thereafter take an action at+1 that then continues on and on. This quintuple (st, at, rt+1, st+1, at+1) gives the algorithm the name SARSA. SARSA is an on-policy algorithm, as the same policy is updated as was used to estimate Q. For exploration, you can use, say, ε-greedy. 

主站蜘蛛池模板: 香格里拉县| 察隅县| 仲巴县| 泗水县| 郎溪县| 贵州省| 鄂伦春自治旗| 敦化市| 遂平县| 金坛市| 延安市| 临城县| 玛沁县| 清苑县| 武夷山市| 平利县| 巴青县| 建德市| 广平县| 墨脱县| 南涧| 临朐县| 桃园市| 英超| 察哈| 应用必备| 鄂温| 巴彦县| 儋州市| 张家口市| 房产| 南昌市| 庆安县| 承德市| 凉山| 北流市| 凤阳县| 五莲县| 新蔡县| 盐边县| 建始县|