官术网_书友最值得收藏!

Bellman equations

As we mentioned, the Q-table functions as your agent's brain. Everything it has learned about its environment is stored in this table. The function that powers your agent's decisions is called a Bellman equation. There are many different Bellman equations, and we will be using a version of the following equation: 

Here, newQ(s,a) is the new value that we are computing for the state-action pair to enter into the Q-table; Q(s,a) is the current state; alpha is the learning rate; R(s,a) is the reward for that state-action pair; gamma is the discount rate; and maxQ(s', a') is the maximum expected future reward given to the new state (that is, the highest possible reward for all of the actions the agent could take from the new state): 

This equation might seem intimidating at first, but it will become much more straightforward once we start translating it into Python code. The maxQ'(s', a') term will be implemented with an argmax function, which we will discuss in detail. This applies to most of the complex math we will encounter here; once you begin coding, it becomes much simpler and clearer to understand.

主站蜘蛛池模板: 林西县| 绩溪县| 九龙城区| 康马县| 永仁县| 广东省| 无为县| 宁化县| 湘潭县| 潢川县| 博白县| 台州市| 电白县| 太谷县| 惠东县| 株洲县| 易门县| 扎囊县| 榆树市| 鄂伦春自治旗| 鄯善县| 台中市| 栾城县| 武强县| 淄博市| 双牌县| 宾阳县| 金昌市| 南宁市| 彭阳县| 乐山市| 黑山县| 阿克| 比如县| 聂荣县| 普定县| 石城县| 米泉市| 衡东县| 新乡县| 五常市|