官术网_书友最值得收藏!

General issues in machine learning models

When we use this input data for the training, validation, and testing, usually the learning algorithms cannot learn 100% accurately, which involves training, validation, and test error (or loss). There are two types of error that one can encounter in a machine learning model:

  • Irreducible error
  • Reducible error

The irreducible error cannot be reduced even with the most robust and sophisticated model. However, the reducible error, which has two components, called bias and variance, can be reduced. Therefore, to understand the model (that is, prediction errors), we need to focus on bias and variance only:

  • Bias means how far the predicted value are from the actual values. Usually, if the average predicted values are very different from the actual values (labels), then the bias is higher.
  • An ML model will have a high bias because it can't model the relationship between input and output variables (can't capture the complexity of data well) and becomes very simple. Thus, a too-simple model with high variance causes underfitting of the data.

The following diagram gives some high-level insights and also shows what a just-right fit model should look like:

Variance signifies the variability between the predicted values and the actual values (how scattered they are).

Identifying high bias and high variance: If the model has a high training error as well as the validation error or test error is the same as the training error, the model has high bias. On the other hand, if the model has low training error but has high validation or high test error, the model has a high variance.

An ML model usually performs very well on the training set but doesn't work well on the test set (because of high error rates). Ultimately, it results in an underfit model. We can recap the overfitting and underfitting once more:

  • Underfitting: If your training and validation error are both relatively equal and very high, then your model is most likely underfitting your training data.
  • Overfitting: If your training error is low and your validation error is high, then your model is most likely overfitting your training data. The just-rightfit model learns very well and performs better on unseen data too.
Bias-variance trade-off: The high bias and high variance issue is often called bias-variance trade-off, because a model cannot be too complex or too simple at the same time. Ideally, we would strive for the best model that has both low bias and low variance.

Now we know the basic working principle of an ML algorithm. However, based on problem type and the method used to solve a problem, ML tasks can be different, for example, supervised learning, unsupervised learning, and reinforcement learning. We'll discuss these learning tasks in more detail in the next section.

主站蜘蛛池模板: 清新县| 红河县| 孝感市| 湟中县| 静安区| 金华市| 长春市| 柘荣县| 镇沅| 托里县| 大竹县| 台安县| 剑阁县| 酉阳| 绥中县| 新乡市| 库尔勒市| 桂林市| 江陵县| 北票市| 靖宇县| 大洼县| 柳江县| 遂溪县| 汾西县| 长宁区| 宁远县| 宁津县| 永定县| 岱山县| 长沙市| 临沭县| 兴化市| 特克斯县| 保山市| 乃东县| 盐城市| 定远县| 左权县| 贺兰县| 永福县|