官术网_书友最值得收藏!

  • The Unsupervised Learning Workshop
  • Aaron Jones Christopher Kruger Benjamin Johnston
  • 336字
  • 2021-06-18 18:12:52

Summary

In this chapter, we discussed how hierarchical clustering works and where it may be best employed. In particular, we discussed various aspects of how clusters can be subjectively chosen through the evaluation of a dendrogram plot. This is a huge advantage over k-means clustering if you have absolutely no idea of what you're looking for in the data. Two key parameters that drive the success of hierarchical clustering were also discussed: the agglomerative versus pisive approach and linkage criteria. Agglomerative clustering takes a bottom-up approach by recursively grouping nearby data together until it results in one large cluster. Divisive clustering takes a top-down approach by starting with the one large cluster and recursively breaking it down until each data point falls into its own cluster. Divisive clustering has the potential to be more accurate since it has a complete view of the data from the start; however, it adds a layer of complexity that can decrease the stability and increase the runtime.

Linkage criteria grapples with the concept of how distance is calculated between candidate clusters. We have explored how centroids can make an appearance again beyond k-means clustering, as well as single and complete linkage criteria. Single linkage finds cluster distances by comparing the closest points in each cluster, while complete linkage finds cluster distances by comparing more distant points in each cluster. With the knowledge that you have gained in this chapter, you are now able to evaluate how both k-means and hierarchical clustering can best fit the challenge that you are working on.

While hierarchical clustering can result in better performance than k-means due to its increased complexity, please remember that more complexity is not always good. Your duty as a practitioner of unsupervised learning is to explore all the options and identify the solution that is both resource-efficient and performant. In the next chapter, we will cover a clustering approach that will serve us best when it comes to highly complex and noisy data: Density-Based Spatial Clustering of Applications with Noise.

主站蜘蛛池模板: 嘉祥县| 昌黎县| 武川县| 两当县| 宁阳县| 肇东市| 正阳县| 莱西市| 阿城市| 达日县| 衡阳县| 大安市| 珠海市| 将乐县| 万州区| 宁化县| 六枝特区| 广宗县| 鄢陵县| 新宁县| 蕉岭县| 德江县| 广汉市| 乐山市| 湟源县| 永靖县| 永修县| 崇义县| 云和县| 容城县| 桐梓县| 嵊泗县| 遂昌县| 阿拉尔市| 兴和县| 孟连| 鲁甸县| 北安市| 炉霍县| 荔浦县| 五家渠市|