書名： PyTorch 1.x Reinforcement Learning Cookbook
作者名： Yuxi (Hayden) Liu
本章字?jǐn)?shù)： 192字
更新時間： 2021-06-24 12:34:45

How to do it...

Creating an MDP can be done via the following steps:

Import PyTorch and define the transition matrix:

 >>> import torch
 >>> T = torch.tensor([[[0.8, 0.1, 0.1],
 ...                    [0.1, 0.6, 0.3]],
 ...                   [[0.7, 0.2, 0.1],
 ...                    [0.1, 0.8, 0.1]],
 ...                   [[0.6, 0.2, 0.2],
 ...                    [0.1, 0.4, 0.5]]]
 ...                  )

Define the reward function and the discount factor:

 >>> R = torch.tensor([1., 0, -1.])
 >>> gamma = 0.5

The optimal policy in this case is selecting action a0 in all circumstances:

>>> action = 0

We calculate the value, V, of the optimal policy using the matrix inversion method in the following function:

 >>> def cal_value_matrix_inversion(gamma, trans_matrix, rewards):
 ...     inv = torch.inverse(torch.eye(rewards.shape[0]) 
                                             - gamma * trans_matrix)
 ...     V = torch.mm(inv, rewards.reshape(-1, 1))
 ...     return V

We will demonstrate how to derive the value in the next section.

We feed all variables we have to the function, including the transition probabilities associated with action a0:

 >>> trans_matrix = T[:, action]
 >>> V = cal_value_matrix_inversion(gamma, trans_matrix, R)
 >>> print("The value function under the optimal 
           policy is:\n{}".format(V))
 The value function under the optimal policy is:
 tensor([[ 1.6787],
         [ 0.6260],
         [-0.4820]])

官术网_书友最值得收藏!

PyTorch 1.x Reinforcement Learning Cookbook

How to do it...