官术网_书友最值得收藏!

Train and test sets

To estimate the generalization error, we split our data into two parts: training data and testing data. A general rule of thumb is to split them by the training: testing ratio, that is, 70:30. We first train the predictor on the training data, then predict the values for the test data, and finally, compute the error, that is, the difference between the predicted and the true values. This gives us an estimate of the true generalization error.

The estimation is based on the two following assumptions: first, we assume that the test set is an unbiased sample from our dataset; and second, we assume that the actual new data will reassemble the distribution as our training and testing examples. The first assumption can be mitigated by cross-validation and stratification. Also, if it is scarce, one can't afford to leave out a considerable amount of data for a separate test set, as learning algorithms do not perform well if they don't receive enough data. In such cases, cross-validation is used instead.

主站蜘蛛池模板: 沂源县| 岢岚县| 佛冈县| 内黄县| 秦皇岛市| 安福县| 辽阳县| 天柱县| 桐乡市| 尼玛县| 巫山县| 广宗县| 清镇市| 长春市| 南丰县| 高邑县| 行唐县| 梓潼县| 眉山市| 龙井市| 稻城县| 阿勒泰市| 内丘县| 天峻县| 泰顺县| 清苑县| 惠州市| 兰西县| 宜昌市| 福安市| 辽源市| 沙湾县| 吴川市| 武安市| 怀化市| 张家港市| 东光县| 蒙山县| 定襄县| 桑植县| 杭州市|