官术网_书友最值得收藏!

K-means

K-means is the simplest implementation of the principle of maximum separation and maximum internal cohesion. Let's suppose we have a dataset X ∈ ?M×N (that is, M N-dimensional samples) that we want to split into K clusters and a set of K centroids corresponding to the means of the samples assigned to each cluster Kj:

The set M and the centroids have an additional index (as a superscript) indicating the iterative step. Starting from an initial guess M(0), K-means tries to minimize an objective function called inertia (that is, the total average intra-cluster distance between samples assigned to a cluster Kj and its centroid μj):

It's easy to understand that S(t) cannot be considered as an absolute measure because its value is highly influenced by the variance of the samples. However, S(t+1) < S(t) implies that the centroids are moving closer to an optimal position where the points assigned to a cluster have the smallest possible distance to the corresponding centroid. Hence, the iterative procedure (also known as Lloyd's algorithm) starts by initializing M(0) with random values. The next step is the assignment of each sample xi ∈ X to the cluster whose centroid has the smallest distance from xi:

Once all assignments have been completed, the new centroids are recomputed as arithmetic means:

The procedure is repeated until the centroids stop changing (this implies also a sequence S(0) > S(1) > ... > S(tend)). The reader should have immediately understood that the computational time is highly influenced by the initial guess. If M(0) is very close to M(tend), a few iterations can find the optimal configuration. Conversely, when M(0) is purely random, the probability of an inefficient initial choice is close to 1 (that is, every initial uniform random choice is almost equivalent in terms of computational complexity).

主站蜘蛛池模板: 晋江市| 营口市| 武鸣县| 于都县| 阿克苏市| 平昌县| 奇台县| 浙江省| 康平县| 青岛市| 文昌市| 内江市| 阆中市| 乌苏市| 凤庆县| 会宁县| 鹿邑县| 马公市| 邛崃市| 乌拉特后旗| 红安县| 石嘴山市| 麦盖提县| 化州市| 周至县| 阳信县| 顺平县| 宁海县| 杨浦区| 常州市| 遂溪县| 辽宁省| 扬中市| 高唐县| 柳河县| 武义县| 巴彦淖尔市| 忻城县| 稷山县| 清远市| 麟游县|