官术网_书友最值得收藏!

H-measure

Binary classification has to apply techniques so that it can map independent variables to different labels. For example, a number of variables exist such as gender, income, number of existing loans, and payment on time/not, that get mapped to yield a score that helps us classify the customers into good customers (more propensity to pay) and bad customers.

Typically, everyone seems to be caught up with the misclassification rate or derived form since the area under curve (AUC) is known to be the best evaluator of our classification model. You get this rate by dividing the total number of misclassified examples by the total number of examples. But does this give us a fair assessment? Let's see. Here, we have a misclassification rate that keeps something important under wraps. More often than not, classifiers come up with a tuning parameter, the side effect of which tends to be favoring false positives over false negatives, or vice versa. Also, picking the AUC as sole model evaluator can act as a double whammy for us. AUC has got different misclassification costs for different classifiers, which is not desirable. This means that using this is equivalent to using different metrics to evaluate different classification rules.

As we have already discussed, the real test of any classifier takes place on the unseen data, and this takes a toll on the model by some decimal points. Adversely, if we have got scenarios like the preceding one, the decision support system will not be able to perform well. It will start producing misleading results.

H-measure overcomes the situation of incurring different misclassification costs for different classifiers. It needs a severity ratio as input, which examines how much more severe misclassifying a class 0 instance is than misclassifying a class 1 instance:

Severity Ratio = cost_0/cost_1

Here, cost_0 > 0 is the cost of misclassifying a class 0 datapoint as class 1.

It is sometimes more convenient to consider the normalized cost c = cost_0/(cost_0 + cost_1) instead. For example, severity.ratio = 2 implies that a false positive costs twice as much as a false negative.

主站蜘蛛池模板: 阜新| 五河县| 大埔县| 固始县| 南涧| 册亨县| 阿拉尔市| 调兵山市| 嫩江县| 凉城县| 湟中县| 遂川县| 岫岩| 探索| 天台县| 洪洞县| 汕头市| 芷江| 三都| 建湖县| 绥江县| 河间市| 石阡县| 林西县| 溧水县| 五大连池市| 离岛区| 彭水| 阿拉尔市| 兴义市| 双江| 聂荣县| 重庆市| 叙永县| 广水市| 汕尾市| 南昌市| 临安市| 武宁县| 石河子市| 上蔡县|