官术网_书友最值得收藏!

A trade-off between homogeneity and completeness using the V-measure

The reader who's familiar with supervised learning should know the concept of F-score (or F-measure), which is the harmonic mean of precision and recall. The same kind of trade-off can be employed also when evaluating clustering results given the ground truth.

In fact, in many cases, it's helpful to have a single measure that takes into account both homogeneity and completeness. Such a result can be easily achieved using the V-measure (or V-score), which is defined as:

For the Breast Cancer Wisconsin dataset, the V-measure is as follows:

from sklearn.metrics import v_measure_score

print('V-Score: {}'.format(v_measure_score(kmdff['diagnosis'], kmdff['prediction'])))

The output of the previous snippet is as follows:

V-Score: 0.46479332792160793

As expected, the V-Score is an average measure that, in this case, is negatively influenced by a lower homogeneity. Of course, this index doesn't provide any different information, hence it's helpful only to synthesize completeness and homogeneity in a single value. However, with a few simple but tedious mathematical manipulations, it's possible to prove that the V-measure is also symmetric (that is, V(Ypred|Vtrue) = V(Ytrue|Ypred)); therefore, given two independent assignments Y1 and Y2, V(Y1|Y2) it is a measure of agreement between them. Such a scenario is not extremely common, because other measures can achieve a better result. However, such a score could be employed, for example, to check whether two algorithms (possibly based on different strategies) tend to produce the same assignments or if they are discordant. In the latter case, even if the ground truth is unknown, the data scientist can understand that one strategy is surely not as effective as the other one and start an exploration process in order to find out the optimal clustering algorithm.

主站蜘蛛池模板: 湛江市| 家居| 布拖县| 抚州市| 长寿区| 永靖县| 洛宁县| 济宁市| 绩溪县| 黔东| 疏勒县| 会同县| 建宁县| 满城县| 榕江县| 金沙县| 天等县| 镶黄旗| 格尔木市| 连州市| 苍梧县| 柘城县| 乌恰县| 遂川县| 大厂| 东安县| 梓潼县| 临颍县| 水城县| 商丘市| 应用必备| 东乡县| 遵义市| 清涧县| 碌曲县| 凤阳县| 会理县| 吉隆县| 定兴县| 昌平区| 亚东县|