- Hands-On Unsupervised Learning with Python
- Giuseppe Bonaccorso
- 286字
- 2021-07-02 12:32:06
A trade-off between homogeneity and completeness using the V-measure
The reader who's familiar with supervised learning should know the concept of F-score (or F-measure), which is the harmonic mean of precision and recall. The same kind of trade-off can be employed also when evaluating clustering results given the ground truth.
In fact, in many cases, it's helpful to have a single measure that takes into account both homogeneity and completeness. Such a result can be easily achieved using the V-measure (or V-score), which is defined as:

For the Breast Cancer Wisconsin dataset, the V-measure is as follows:
from sklearn.metrics import v_measure_score
print('V-Score: {}'.format(v_measure_score(kmdff['diagnosis'], kmdff['prediction'])))
The output of the previous snippet is as follows:
V-Score: 0.46479332792160793
As expected, the V-Score is an average measure that, in this case, is negatively influenced by a lower homogeneity. Of course, this index doesn't provide any different information, hence it's helpful only to synthesize completeness and homogeneity in a single value. However, with a few simple but tedious mathematical manipulations, it's possible to prove that the V-measure is also symmetric (that is, V(Ypred|Vtrue) = V(Ytrue|Ypred)); therefore, given two independent assignments Y1 and Y2, V(Y1|Y2) it is a measure of agreement between them. Such a scenario is not extremely common, because other measures can achieve a better result. However, such a score could be employed, for example, to check whether two algorithms (possibly based on different strategies) tend to produce the same assignments or if they are discordant. In the latter case, even if the ground truth is unknown, the data scientist can understand that one strategy is surely not as effective as the other one and start an exploration process in order to find out the optimal clustering algorithm.
- Windows phone 7.5 application development with F#
- Linux KVM虛擬化架構實戰指南
- 計算機組裝·維護與故障排除
- 3ds Max Speed Modeling for 3D Artists
- micro:bit魔法修煉之Mpython初體驗
- Svelte 3 Up and Running
- Spring Cloud微服務架構實戰
- “硬”核:硬件產品成功密碼
- 觸摸屏應用技術從入門到精通
- Corona SDK Mobile Game Development:Beginner's Guide
- Raspberry Pi Home Automation with Arduino
- 嵌入式系統原理:基于Arm Cortex-M微控制器體系
- Arduino+3D打印創新電子制作2
- Hands-On Embedded Programming with C++17
- 新型復印機·傳真機維修數據速查寶典