官术网_书友最值得收藏!

How do ensembles work?

The randomness inherent in random forests may make it seem like we are leaving the results of the algorithm up to chance. However, we apply the benefits of averaging to nearly randomly built decision trees, resulting in an algorithm that reduces the variance of the result.

Variance is the error introduced by variations in the training dataset on the algorithm. Algorithms with a high variance (such as decision trees) can be greatly affected by variations to the training dataset. This results in models that have the problem of overfitting. In contrast, bias is the error introduced by assumptions in the algorithm rather than anything to do with the dataset, that is, if we had an algorithm that presumed that all features would be normally distributed, then our algorithm may have a high error if the features were not.

Negative impacts from bias can be reduced by analyzing the data to see if the classifier's data model matches that of the actual data.

To use an extreme example, a classifier that always predicts true, regardless of the input, has a very high bias. A classifier that always predicts randomly would have a very high variance. Each classifier has a high degree of error but of a different nature.

By averaging a large number of decision trees, this variance is greatly reduced. This results, at least normally, in a model with a higher overall accuracy and better predictive power. The trade-offs are an increase in time and an increase in the bias of the algorithm.

In general, ensembles work on the assumption that errors in prediction are effectively random and that those errors are quite different from one classifier to another. By averaging the results across many models, these random errors are canceled out—leaving the true prediction. We will see many more ensembles in action throughout the rest of the book.

主站蜘蛛池模板: 黔西县| 金塔县| 广丰县| 九台市| 湘潭县| 肥乡县| 措勤县| 河源市| 济阳县| 扎囊县| 阿拉善左旗| 朝阳区| 比如县| 民丰县| 迁安市| 揭西县| 永川市| 道真| 楚雄市| 普格县| 昌图县| 柘荣县| 克山县| 皮山县| 揭阳市| 昭苏县| 长治市| 聂拉木县| 容城县| 礼泉县| 滁州市| 林芝县| 靖远县| 曲松县| 观塘区| 姜堰市| 星座| 曲沃县| 郎溪县| 靖远县| 石门县|