官术网_书友最值得收藏!

Setting up the Agent

Agents represents the actors that we are training to learn to perform some task or set of task-based commands on some reward. We will cover more about actors, actions, state, and rewards when we talk more about Reinforcement Learning in Chapter 2, The Bandit and Reinforcement Learning. For now, all we need to do is set the Brain the agent will be using. Open up the editor and follow these steps:

  1. Locate the Agent object in the Hierarchy window and select it.
  1. Click the Target icon beside the Brain property on the Simple Agent component and select the Brain object in the scene, as shown in the following screenshot:
Setting the Agent Brain
  1. Click the Target icon on the Simple Agent component and from the context menu select Edit Script. The agent script is what we use to observe the environment and collect observations. In our current example, we always assume that there is no previous observation.
  2. Enter the highlighted code in the CollectObservations method as follows:
      public override void CollectObservations()
{
AddVectorObs(0);
}
  1. CollectObservations is the method called to set what the Agent observes about the environment. This method will be called on every agent step or action. We use AddVectorObs to add a single float value of 0 to the agent's observation collection. At this point, we are not currently using any observations and will assume our bandit provides no visual clues as to what arm to pull. 
     The agent will also need to evaluate the rewards and when they are collected. We will need to add four slots, one for each arm to our agent, in order to represent the reward when that arm is pulled.
  2. Enter the following code in the SimpleAgent class:
      public Bandit bandit;
public override void AgentAction(float[] vectorAction,
string textAction)
{
var action = (int)vectorAction[0];
AddReward(bandit.PullArm(action));
Done();
}

public override void AgentReset()
{
bandit.Reset();
}
  1. The code in our AgentStep method just takes the current action and applies that to the Bandit with the PullArm method, passing in the arm to pull. The reward returned from the bandit is added using AddReward. After that, we implement some code in the AgentReset method. This code just resets the Bandit back to its starting state. AgentReset is called when the agent is done, complete, or runs out of steps. Notice how we call the method Done after each step; this is because our bandit is only a single state or action.
  2. Add the following code just below the last section:
      public Academy academy;
public float timeBetweenDecisionsAtInference;
private float timeSinceDecision;

public void FixedUpdate()
{
WaitTimeInference();
}

private void WaitTimeInference()
{
if (!academy.GetIsInference())
{
RequestDecision();
}
else
{
if (timeSinceDecision >= timeBetweenDecisionsAtInference)
{
timeSinceDecision = 0f;
RequestDecision();
}
else
{
timeSinceDecision += Time.fixedDeltaTime;
}
}
}
  1. We need to add the preceding code in order for our brain to wait long enough for it to accept Player decisions. Our first example that we will build will use player input. Don't worry too much about this code, as we only need it to allow for player input. When we develop our Agent Brains, we won't need to put a delay in.
  2. Save the script when you are done editing.
  3. Return to the editor and set the properties on the Simple Agent, as shown in the following screenshot:
Setting the Simple Agent properties

We are almost done. The agent is now able to interpret our actions and execute them on the Bandit. Actions are sent to the agent from the Brain. The Brain is responsible for making decisions and we will cover its setup in the next section.

主站蜘蛛池模板: 丰城市| 高州市| 仁布县| 黎川县| 德钦县| 康马县| 平远县| 札达县| 富阳市| 宝丰县| 延津县| 樟树市| 鲁甸县| 西乌珠穆沁旗| 阿拉尔市| 永川市| 连平县| 故城县| 弋阳县| 灌阳县| 新乡县| 萍乡市| 望城县| 邮箱| 海盐县| 平舆县| 南靖县| 闽清县| 桐梓县| 辽中县| 天台县| 宣武区| 科技| 同心县| 广汉市| 察雅县| 民乐县| 建德市| 无极县| 正阳县| 开鲁县|