官术网_书友最值得收藏!

Ensembles and model-averaging

Another approach to regularization involves creating multiple models (ensembles) and combining them, such as by model-averaging or some other algorithm for combining individual model results. There is a rich history of using ensemble techniques in machine learning, such as bagging, boosting, and random forest, that use this technique. The general idea is that, if you build different models using the training data, each model has different errors in the predicted values. Where one model predicts too high a value, another may predict too low a value, and when averaged, some of the errors cancel out, resulting in a more accurate prediction than would have been otherwise obtained.

The key to ensemble methods is that the different models must have some variability in their predictions. If the predictions from the different models are highly correlated, then using ensemble techniques will not be beneficial. If the predictions from the different models have very low correlations, then the average will be far more accurate as it gains the strengths of each model. The following code gives an example using simulated data. This small example illustrates the point with just three models:

## simulated data
set.seed(1234)
d <- data.frame(
x = rnorm(400))
d$y <- with(d, rnorm(400, 2 + ifelse(x < 0, x + x^2, x + x^2.5), 1))
d.train <- d[1:200, ]
d.test <- d[201:400, ]

## three different models
m1 <- lm(y ~ x, data = d.train)
m2 <- lm(y ~ I(x^2), data = d.train)
m3 <- lm(y ~ pmax(x, 0) + pmin(x, 0), data = d.train)

## In sample R2
cbind(M1=summary(m1)$r.squared,
M2=summary(m2)$r.squared,M3=summary(m3)$r.squared)
M1 M2 M3
[1,] 0.33 0.6 0.76

We can see that the predictive value of each model, at least in the training data, varies quite a bit. Evaluating the correlations among fitted values in the training data can also help to indicate how much overlap there is among the model predictions:

cor(cbind(M1=fitted(m1),
M2=fitted(m2),M3=fitted(m3)))
M1 M2 M3
M1 1.00 0.11 0.65
M2 0.11 1.00 0.78
M3 0.65 0.78 1.00

Next, we generate predicted values for the testing data, the average of the predicted values, and again correlate the predictions along with reality in the testing data:

## generate predictions and the average prediction
d.test$yhat1 <- predict(m1, newdata = d.test)
d.test$yhat2 <- predict(m2, newdata = d.test)
d.test$yhat3 <- predict(m3, newdata = d.test)
d.test$yhatavg <- rowMeans(d.test[, paste0("yhat", 1:3)])

## correlation in the testing data
cor(d.test)
x y yhat1 yhat2 yhat3 yhatavg
x 1.000 0.44 1.000 -0.098 0.60 0.55
y 0.442 1.00 0.442 0.753 0.87 0.91
yhat1 1.000 0.44 1.000 -0.098 0.60 0.55
yhat2 -0.098 0.75 -0.098 1.000 0.69 0.76
yhat3 0.596 0.87 0.596 0.687 1.00 0.98
yhatavg 0.552 0.91 0.552 0.765 0.98 1.00

From the results, we can see that the average of the three models' predictions performs better than any of the models individually. However, this is not always the case; one good model may have better predictions than the average predictions. In general, it is good to check that the models being averaged perform similarly, at least in the training data. The second lesson is that, given models with similar performance, it is desirable to have lower correlations between model predictions, as this will result in the best performing average.

There are other forms of ensemble methods that are included in other machine learning algorithms, for example, bagging and boosting. Bagging is used in random forests, where many models are generated, each having different samples of the data. The models are deliberately designed to be small, incomplete models. By averaging the predictions of lots of undertrained models that use only a portion of the data, we should get a more powerful model. An example of boosting includes gradient-boosted models (GBMs), which also use multiple models, but this time each model focuses on the instances that were incorrectly predicted in the previous model. Both random forests and GBMs have proven to be very successful with structured data because they reduce variance, that is, avoid overfitting the data.

Bagging and model-averaging are not used as frequently in deep neural networks because the computational cost of training each model can be quite high, and thus repeating the process many times becomes prohibitively expensive in terms of time and compute resources. Nevertheless, it is still possible to use model averaging in the context of deep neural networks, even if perhaps it is on only a handful of models rather than hundreds, as is common in random forests and some other approaches.

主站蜘蛛池模板: 云阳县| 武山县| 呼玛县| 砚山县| 隆子县| 海原县| 陇南市| 汾阳市| 交城县| 沐川县| 诸城市| 石家庄市| 华亭县| 元阳县| 东源县| 靖宇县| 磐安县| 鄂伦春自治旗| 咸丰县| 福州市| 宁蒗| 汉寿县| 阜新| 兴义市| 万荣县| 灵宝市| 玉门市| 乌兰县| 潜山县| 清远市| 昭平县| 拉萨市| 县级市| 皮山县| 黔南| 博白县| 会东县| 雅江县| 昆明市| 江陵县| 赤水市|