- Hands-On Unsupervised Learning with Python
- Giuseppe Bonaccorso
- 209字
- 2021-07-02 12:32:04
K-means
K-means is the simplest implementation of the principle of maximum separation and maximum internal cohesion. Let's suppose we have a dataset X ∈ ?M×N (that is, M N-dimensional samples) that we want to split into K clusters and a set of K centroids corresponding to the means of the samples assigned to each cluster Kj:

The set M and the centroids have an additional index (as a superscript) indicating the iterative step. Starting from an initial guess M(0), K-means tries to minimize an objective function called inertia (that is, the total average intra-cluster distance between samples assigned to a cluster Kj and its centroid μj):

It's easy to understand that S(t) cannot be considered as an absolute measure because its value is highly influenced by the variance of the samples. However, S(t+1) < S(t) implies that the centroids are moving closer to an optimal position where the points assigned to a cluster have the smallest possible distance to the corresponding centroid. Hence, the iterative procedure (also known as Lloyd's algorithm) starts by initializing M(0) with random values. The next step is the assignment of each sample xi ∈ X to the cluster whose centroid has the smallest distance from xi:

Once all assignments have been completed, the new centroids are recomputed as arithmetic means:

The procedure is repeated until the centroids stop changing (this implies also a sequence S(0) > S(1) > ... > S(tend)). The reader should have immediately understood that the computational time is highly influenced by the initial guess. If M(0) is very close to M(tend), a few iterations can find the optimal configuration. Conversely, when M(0) is purely random, the probability of an inefficient initial choice is close to 1 (that is, every initial uniform random choice is almost equivalent in terms of computational complexity).
- Linux KVM虛擬化架構實戰指南
- Applied Unsupervised Learning with R
- 計算機組裝與系統配置
- Mastering Delphi Programming:A Complete Reference Guide
- 數字邏輯(第3版)
- Mastering Manga Studio 5
- scikit-learn:Machine Learning Simplified
- 計算機組裝與維修技術
- Spring Cloud微服務架構實戰
- 微型計算機系統原理及應用:國產龍芯處理器的軟件和硬件集成(基礎篇)
- 計算機應用基礎案例教程(Windows 7+Office 2010)
- 基于S5PV210處理器的嵌入式開發完全攻略
- 微服務架構實戰:基于Spring Boot、Spring Cloud、Docker
- FPGA進階開發與實踐
- FPGA設計技巧與案例開發詳解