官术网_书友最值得收藏!

Apparent (training set) error

This the first type of error that you don't have to care about minimizing. Getting a small value for this type of error doesn't mean that your model will work well over the unseen data (generalize). To better understand this type of error, we'll give a trivial example of a class scenario. The purpose of solving problems in the classroom is not to be able to solve the same problem again in the exam, but to be able to solve other problems that won’t necessarily be similar to the ones you practiced in the classroom. The exam problems could be from the same family of the classroom problems, but not necessarily identical.

Apparent error is the ability of the trained model to perform on the training set for which we already know the true outcome/output. If you manage to get 0 error over the training set, then it is a good indicator for you that your model (mostly) won't work well on unseen data (won't generalize). On the other hand, data science is about using a training set as a base knowledge for the learning algorithm to work well on future unseen data.

In Figure 3, the red curve represents the apparent error. Whenever you increase the model's ability to memorize things (such as increasing the model complexity by increasing the number of explanatory features), you will find that this apparent error approaches zero. It can be shown that if you have as many features as observations/samples, then the apparent error will be zero:

Figure 13: Apparent error (red curve) and generalization/true error (light blue)
主站蜘蛛池模板: 故城县| 志丹县| 鸡泽县| 郎溪县| 宣武区| 镇赉县| 张家港市| 白玉县| 宁强县| 咸宁市| 手游| 曲阳县| 关岭| 迭部县| 开远市| 乐都县| 汕尾市| 临邑县| 建湖县| 荥经县| 卢龙县| 万州区| 乌鲁木齐县| 和硕县| 遂溪县| 黑龙江省| 行唐县| 正阳县| 子长县| 聂拉木县| 安国市| 汪清县| 澎湖县| 龙里县| 闸北区| 塘沽区| 陇川县| 溧阳市| 秦安县| 清镇市| 井陉县|