官术网_书友最值得收藏!

How it works...

Model evaluation is a key step in any machine learning process. It is different for supervised and unsupervised models. In supervised models, predictions play a major role; whereas in unsupervised models, homogeneity within clusters and heterogeneity across clusters play a major role.

Some widely used model evaluation parameters for regression models (including cross validation) are as follows:

  • Coefficient of determination
  • Root mean squared error
  • Mean absolute error
  • Akaike or Bayesian information criterion

Some widely used model evaluation parameters for classification models (including cross validation) are as follows:

  • Confusion matrix (accuracy, precision, recall, and F1-score)
  • Gain or lift charts
  • Area under ROC (receiver operating characteristic) curve
  • Concordant and discordant ratio

Some of the widely used evaluation parameters of unsupervised models (clustering) are as follows:

  • Contingency tables
  • Sum of squared errors between clustering objects and cluster centers or centroids
  • Silhouette value
  • Rand index
  • Matching index
  • Pairwise and adjusted pairwise precision and recall (primarily used in NLP)

Bias and variance are two key error components of any supervised model; their trade-off plays a vital role in model tuning and selection. Bias is due to incorrect assumptions made by a predictive model while learning outcomes, whereas variance is due to model rigidity toward the training dataset. In other words, higher bias leads to underfitting and higher variance leads to overfitting of models.

In bias, the assumptions are on target functional forms. Hence, this is dominant in parametric models such as linear regression, logistic regression, and linear discriminant analysis as their outcomes are a functional form of input variables.

Variance, on the other hand, shows how susceptible models are to change in datasets. Generally, target functional forms control variance. Hence, this is dominant in non-parametric models such as decision trees, support vector machines, and K-nearest neighbors as their outcomes are not directly a functional form of input variables. In other words, the hyperparameters of non-parametric models can lead to overfitting of predictive models.

主站蜘蛛池模板: 岢岚县| 涟源市| 富源县| 海淀区| 沾益县| 新化县| 岳普湖县| 砚山县| 巴林右旗| 石泉县| 虞城县| 迁西县| 大余县| 新郑市| 青铜峡市| 彭山县| 虎林市| 贵南县| 茌平县| 洪雅县| 增城市| 阿克苏市| 宁晋县| 金川县| 百色市| 罗平县| 田林县| 汶川县| 南漳县| 建湖县| 兴安县| 余庆县| 禹城市| 屏南县| 寿宁县| 太谷县| 瓦房店市| 湖口县| 兴仁县| 莲花县| 甘孜|