官术网_书友最值得收藏!

Underfitting

A model with a high bias is likely to underfit the training set. Let's consider the scenario shown in the following graph:

Underfitted classifier: The curve cannot separate correctly the two classes

Even if the problem is very hard, we could try to adopt a linear model and, at the end of the training process, the slope and the intercept of the separating line are about -1 and 0 (as shown in the plot); however, if we measure the accuracy, we discover that it's close to 0! Independently from the number of iterations, this model will never be able to learn the association between X and Y. This condition is called underfitting, and its major indicator is a very low training accuracy. Even if some data preprocessing steps can improve the accuracy, when a model is underfitted, the only valid solution is to adopt a higher-capacity model.

In a machine learning task, our goal is to achieve the maximum accuracy, starting from the training set and then moving on to the validation set. More formally, we can say that we want to improve our models so to get as close as possible to Bayes accuracy. This is not a well-defined value, but a theoretical upper limit that is possible to achieve using an estimator. In the following diagram, we see a representation of this process:

Accuracy level diagram

Bayes accuracy is often a purely theoretical limit and, for many tasks, it's almost impossible to achieve using even biological systems; however, advancements in the field of deep learning allow to create models that have a target accuracy slightly below the Bayes one. In general, there's no closed form for determining the Bayes accuracy, therefore human abilities are considered as a benchmark. In the previous classification example, a human being is immediately able to distinguish among different dot classes, but the problem can be very hard for a limited-capacity classifier. Some of the models we're going to discuss can solve this problem with a very high target accuracy, but at this point, we run another risk that can be understood after defining the concept of variance of an estimator.

主站蜘蛛池模板: 鹤峰县| 杂多县| 简阳市| 分宜县| 龙川县| 黄梅县| 安仁县| 兰考县| 富民县| 荃湾区| 黄大仙区| 曲麻莱县| 义马市| 松滋市| 双鸭山市| 托克逊县| 察哈| 昌邑市| 铁岭县| 剑川县| 绵竹市| 常宁市| 徐闻县| 安平县| 阿拉尔市| 普陀区| 武安市| 临潭县| 五华县| 白水县| 肃宁县| 灵山县| 都江堰市| 南平市| 扶绥县| 清徐县| 饶平县| 防城港市| 三台县| 三穗县| 枣强县|