官术网_书友最值得收藏!

Data understanding and preparation

To start, we will load the necessary packages and put the required ones in the environment. The data is in the MASS package:

> library(magrittr)

> install.packages(caret)

> install.packages(MASS)

> library(MASS)

> install.packages("neuralnet")

> install.packages("vtreat")

The neuralnet package will be used for building the model and caret for data preparation. Let's load the data and examine its structure:

> data(shuttle)

> str(shuttle)

The data consists of 256 observations and 7 features. Notice that all of the features are categorical and the response is use with two levels, auto and noauto, as follows:

  • stability: This is stable positioning or not (stab/xstab)
  • error: This is the size of the error (MM / SS / LX)
  • sign: This is the sign of the error, positive or negative (pp/nn)
  • wind: This is the wind sign (head / tail)
  • magn: This is the wind strength (Light / Medium / Strong / Out of Range)
  • vis: This is the visibility (yes / no)

Here, we will look at a table of the response/outcome:

> table(shuttle$use)
auto noauto
145 111

Almost 57% of the time, the decision is to use the autolander. We'll now get our training and testing data set up for modeling:

> set.seed(1942)

> trainIndex <-
caret::createDataPartition(shuttle$use, p = .6, list = FALSE)

> shuttleTrain <- shuttle[trainIndex, -7]

> shuttleTest <- shuttle[-trainIndex, -7]

We are going to treat the data to create numeric features, and also drop the cat_P features that the function creates. We covered the idea of treating a dataframe in Chapter 1, Preparing and Understanding Data:

> treatShuttle <- vtreat::designTreatmentsZ(shuttleTrain, colnames(shuttleTrain))

> train_treated <- vtreat::prepare(treatShuttle, shuttleTrain)

> train_treated <- train_treated[, c(-1,-2)]

> test_treated <- vtreat::prepare(treatShuttle, shuttleTest)

> test_treated <- test_treated[, c(-1, -2)]

The next couple portions of code I find awkward. Because neuralnet() requires a formula and the data in a dataframe, we have to turn the response into a numeric list and then add it to our treated train and test data:

> shuttle_trainY <- shuttle[trainIndex, 7]

> train_treated$y <- ifelse(shuttle_trainY == "auto", 1, 0)

> shuttle_testY <- shuttle[-trainIndex, 7]

> test_treated$y <- ifelse(shuttle_testY == "auto", 1, 0)

The function in neuralnet will call for the use of a formula as we used elsewhere, such as y~x1+x2+x3+x4, data = df. In the past, we used y~ to specify all the other variables in the data as inputs. However, neuralnet does not accommodate this at the time of writing. The way around this limitation is to use the as.formula() function. After first creating an object of the variable names, we will use this as an input to paste the variables properly on the right-hand side of the equation:

> n <- names(train_treated)

> form <- as.formula(paste("y ~", paste(n[!n %in% "y"], collapse = " + ")))

The object form give us what we need to build our model.

主站蜘蛛池模板: 湖南省| 克东县| 茌平县| 保德县| 普格县| 睢宁县| 武义县| 通榆县| 金沙县| 锦州市| 北海市| 探索| 庆安县| 吉隆县| 二手房| 成安县| 芦溪县| 信丰县| 焦作市| 嘉荫县| 朝阳市| 南木林县| 类乌齐县| 滨州市| 承德市| 万载县| 新民市| 板桥市| 河源市| 博爱县| 承德市| 焦作市| 时尚| 犍为县| 朝阳市| 梅州市| 祁门县| 康保县| 平原县| 和田县| 丰台区|