官术网_书友最值得收藏!

Isomap

Isomap is one of the simplest algorithms, and it's based on the idea of reducing the dimensionality while trying to preserve the geodesic distances measured on the original manifold where the input data lies. The algorithm works in three steps. The first operation is a k-nearest neighbors clustering and the construction of the following graph. The vertices will be the samples, while the edges represent the connections among nearest neighbors, and their weight is proportional to the distance to the corresponding neighbor. 

The second step adopts the Dijkstra algorithm to compute the shortest pairwise distances on the graph of all couples of samples. In the following graph, there's a portion of a graph, where some shortest distances are marked:

 

Example of a graph with marked shortest distances

For example, as x3 is a neighbor of x5 and x7, applying the Dijkstra algorithm, we could get the shortest paths d(x3, x5) = w53 and d(x3, x7) = w73. The computational complexity of this step is about O(n2log n + n2k), which is lower than O(n3) when k << n (a condition normally met); however, for large graphs (with n >> 1), this is often the most expensive part of the whole algorithm.

The third step is called metric multidimensional scaling, which is a technique for finding a low-dimensional representation while trying to preserve the inner product among samples. If we have a P-dimensional dataset X, the algorithm must find a Q-dimensional set Φ with Q < P minimizing the function:

As proven in Semi-Supervised Learning  Chapelle O., Sch?lkopf B., Zien A., (edited by), The MIT Press, the optimization is achieved by taking the top Q eigenvectors of the Gram matrix Gij = xi · xj (or in matrix form, G=XXT if X ∈ ?n × M); however, as the Isomap algorithm works with pairwise distances, we need to compute the matrix D of squared distances:

If the X dataset is zero-centered, it's possible to derive a simplified Gram matrix from D, as described by M. A. A. Cox and T. F. Cox:

Isomap computes the top Q eigenvalues λ1, λ2, ..., λQ of GD and the corresponding eigenvectors ν1ν2, ..., νQ and determines the Q-dimensional vectors as:

As we're going to discuss in Chapter 5, EM Algorithm and Applications (and also as pointed out by Saul, Weinberger, Sha, Ham, and Lee in Spectral Methods for Dimensionality Reduction, Saul L. K., Weinberger K. Q., Sha F., Ham J., and Lee D. D.), this kind of projection is also exploited by Principal Component Analysis (PCA), which finds out the direction with the highest variance, corresponding to the top k eigenvectors of the covariance matrix. In fact, when applying the SVD to the dataset X, we get:

The diagonal matrix Λ contains the eigenvalues of both XXT and XTX; therefore, the eigenvalues λGi of G are equal to MλΣi where λΣi are the eigenvalues of the covariance matrix Σ = M-1XTX. Hence, Isomap achieves the dimensionality reduction, trying to preserve the pairwise distances, while projecting the dataset in the subspace determined by a group of eigenvectors, where the maximum explained variance is achieved. In terms of information theory, this condition guarantees the minimum loss with an effective reduction of dimensionality.

Scikit-Learn also implements the Floyd-Warshall algorithm, which is slightly slower. For further information, please refer to Introduction to Algorithms, Cormen T. H., Leiserson C. E., Rivest R. L., The MIT Press.

主站蜘蛛池模板: 天镇县| 武陟县| 昭通市| 甘洛县| 阿巴嘎旗| 扶余县| 屯留县| 故城县| 阜城县| 桂平市| 资兴市| 婺源县| 华亭县| 瑞昌市| 紫阳县| 双城市| 十堰市| 长白| 泰安市| 湟源县| 苏尼特左旗| 棋牌| 鸡泽县| 兴化市| 阳东县| 灵石县| 苗栗市| 河曲县| 云浮市| 乐昌市| 大石桥市| 潮州市| 前郭尔| 和平县| 姜堰市| 澄城县| 六盘水市| 遵义县| 巩义市| 渭源县| 嘉义县|