- The Unsupervised Learning Workshop
- Aaron Jones Christopher Kruger Benjamin Johnston
- 314字
- 2021-06-18 18:12:52
k-means versus Hierarchical Clustering
In the previous chapter, we explored the merits of k-means clustering. Now, it is important to explore where hierarchical clustering fits into the picture. As we mentioned in the Linkage section, there is some potential direct overlap when it comes to grouping data points together using centroids. Universal to all of the approaches we've mentioned so far is the use of a distance function to determine similarity. Due to our in-depth exploration in the previous chapter, we used the Euclidean distance here, but we understand that any distance function can be used to determine similarities.
In practice, here are some quick highlights for choosing one clustering method over another:
- Hierarchical clustering benefits from not needing to pass in an explicit "k" number of clusters a priori. This means that you can find all the potential clusters and decide which clusters make the most sense after the algorithm has completed.
- The k-means clustering benefits from a simplicity perspective – oftentimes, in business use cases, there is a challenge when it comes to finding methods that can be explained to non-technical audiences but are still accurate enough to generate quality results. k-means can easily fill this niche.
- Hierarchical clustering has more parameters to tweak than k-means clustering when it comes to dealing with abnormally shaped data. While k-means is great at finding discrete clusters, it can falter when it comes to mixed clusters. By tweaking the parameters in hierarchical clustering, you may find better results.
- Vanilla k-means clustering works by instantiating random centroids and finding the closest points to those centroids. If they are randomly instantiated in areas of the feature space that are far away from your data, then it can end up taking quite some time to converge, or it may never even get to that point. Hierarchical clustering is less prone to falling prey to this weakness.
推薦閱讀
- 用“芯”探核:龍芯派開發(fā)實戰(zhàn)
- Python GUI Programming:A Complete Reference Guide
- Creating Dynamic UI with Android Fragments
- 計算機組裝·維護與故障排除
- 計算機應(yīng)用與維護基礎(chǔ)教程
- 現(xiàn)代辦公設(shè)備使用與維護
- 計算機組裝與維修技術(shù)
- 嵌入式系統(tǒng)中的模擬電路設(shè)計
- OpenGL Game Development By Example
- 龍芯自主可信計算及應(yīng)用
- Istio服務(wù)網(wǎng)格技術(shù)解析與實踐
- 單片機原理及應(yīng)用:基于C51+Proteus仿真
- The Artificial Intelligence Infrastructure Workshop
- Mastering Machine Learning on AWS
- 計算機電路基礎(chǔ)(第2版)