Inside the Trainer class, a large portion was rewritten to handle the expanded features used and to provide regression algorithm evaluation as opposed to the binary classification we looked at in Chapter 2, Setting Up the ML.NET Environment.
The first change is the use of a comma to separate the data as opposed to the default tab like we used in Chapter 2, Setting Up the ML.NET Environment:
var trainingDataView = MlContext.Data.LoadFromTextFile<EmploymentHistory>(trainingFileName, ',');
The next change is in the pipeline creation itself. In our first application, we had a label and fed that straight into the pipeline. With this application, we have nine features to predict the duration of a person's employment in the DurationInMonths property and append each one of them to the pipeline using the C# 6.0 feature, nameof. You might have noticed the use of magic strings to map class properties to features in various code samples on GitHub and MSDN; personally, I find this error-prone compared to the strongly typed approach.
For every property, we call the NormalizeMeanVariance transform method, which as the name implies normalizes the input data both on the mean and the variance. ML.NET computes this by subtracting the mean of the input data and dividing that value by the variance of the inputted data. The purpose behind this is to nullify outliers in the input data so the model isn't skewed to handle an edge case compared to the normal range. For example, suppose the sample dataset of employment history had 20 rows and all but one of those rows had a person with 50 years experience. The one row that didn't fit would be normalized to better fit within the ranges of values entered into the model.
In addition, note the use of the extension method referred to earlier to help to simplify the following code, when we concatenate all of the feature columns:
We can then create the Sdca trainer using the default parameters ("Label" and "Features"):
var trainer = MlContext.Regression.Trainers.Sdca(labelColumnName: "Label", featureColumnName: "Features");
Lastly, we call the Regression.Evaluate method to provide regression specific metrics, followed by a Console.WriteLine call to provide these metrics to your console output. We will go into detail about what each of these means in the last section of this chapter:
var modelMetrics = MlContext.Regression.Evaluate(testSetTransform);