官术网_书友最值得收藏!

Overfitting

Overfitting occurs when the model was so well trained that it fits the training data too perfectly and cannot handle new data. 

Say you have a unique predictor of an outcome and that the data follows a quadratic pattern:

  1. You fit a linear regression on that data , the predictions are weak. Your model is underfitting the data. There is a high error level on both the training error and the validation dataset.
  2. You add the square of the predictor in the model  and find that your model makes good predictions. The error on both the training and the validation datasets are equivalent and lower than for the simpler model.
  3. If you increase the number and power of polynomial features so that the model is now , you end up fitting the training data too closely. The model has a very low prediction error on the training dataset but is unable to predict anything on new data. The prediction error on the validation dataset remains high.

This is a case of overfitting.

The following graph shows an example of an overfitting model with regard to the previous quadratic dataset, by setting a high order for the polynomial regression (n = 16). The polynomial regression fits the training data so well it would be incapable of any predictions on new data whereas the quadratic model (n = 2) would be more robust:

The best way to detect overfitting is, therefore, to compare the prediction errors on the training and validation sets. A significant gap between the two errors implies overfitting. A way to prevent this overfitting from happening is to add constraints on the model. In machine learning, we use regularization.

主站蜘蛛池模板: 四会市| 大冶市| 安龙县| 囊谦县| 湾仔区| 凌云县| 玉山县| 双辽市| 凤冈县| 潼南县| 中山市| 卓资县| 万年县| 双桥区| 闵行区| 会同县| 舟曲县| 定州市| 沂水县| 同德县| 清河县| 门头沟区| 龙山县| 枣阳市| 双辽市| 讷河市| 依安县| 灵璧县| 楚雄市| 湖南省| 清新县| 鹿泉市| 宽城| 华安县| 河西区| 阿坝| 乐业县| 浦江县| 铁力市| 云阳县| 漳平市|