官术网_书友最值得收藏!

Q-Learning

Now, let's try to program a reinforcement learning agent using Q-learning. Q-learning consists of a Q-table that contains Q-values for each state-action pair. The number of rows in the table is equal to the number of states in the environment and the number of columns equals the number of actions. Since the number of states is 16 and the number of actions is 4, the Q-table for this environment consists of 16 rows and 4 columns. The code for it is given here:

print("Number of actions : ",env.action_space.n)
print("Number of states : ",env.observation_space.n)

----------------------
Number of actions : 4
Number of states : 16

The steps involved in Q-learning are as follows:

  1. Initialize the Q-table with zeros (eventually, updating will happen with a reward received for each action taken during learning).
  1. Updating of a Q value for a state-action pair, that is, Q(s, a) is given by:

In this formula:

    • s = current state
    • a = action taken (choosing new action through epsilon-greedy approach)
    • s' = resulted new state
    • a' = action for the new state
    • r = reward received for the action a
    • = learning rate, that is, the rate at which the learning of the agent converges towards minimized error
    • = discount factor, that is, discounts the future reward to get an idea of how important that future reward is with regards to the current reward
  1. By updating the Q-values as per the formula mentioned in step 2, the table converges to obtain accurate values for an action in a given state.
主站蜘蛛池模板: 永清县| 哈尔滨市| 崇义县| 湟中县| 得荣县| 大新县| 孝感市| 阳城县| 镇原县| 法库县| 射阳县| 定日县| 鹤岗市| 灵丘县| 易门县| 鸡泽县| 济宁市| 宜阳县| 沽源县| 当雄县| 新河县| 宜章县| 城市| 乡宁县| 涟源市| 九寨沟县| 桃园县| 饶河县| 攀枝花市| 临沧市| 山西省| 北宁市| 讷河市| 江源县| 焦作市| 北碚区| 玛纳斯县| 竹溪县| 通山县| 新昌县| 庐江县|