官术网_书友最值得收藏!

t-SNE

This algorithm, proposed by Van der Mateen and Hinton and formally known as t-Distributed Stochastic Neighbor Embedding (t-SNE), is one of the most powerful manifold dimensionality reduction techniques. Contrary to the other methods, this algorithm starts with a fundamental assumption: the similarity between two N-dimensional points xi and xj can be represented as the conditional probability p(xj|xi) where each point is represented by a Gaussian distribution centered in xi and with variance σi. The variances are selected starting from the desired perplexity, defined as:

Low-perplexity values indicate a low uncertainty, and are normally preferable. In common t-SNE tasks, values in the range 10÷50 are normally acceptable.

The assumption on the conditional probabilities can be interpreted thinking that if two samples are very similar, the probability associated with the first sample conditioned to the second one is high, while dissimilar points yield low conditional probabilities. For example, thinking about images, a point centered in the pupil can have as neighbors some points belonging to an eyelash. In terms of probabilities, we can think that p(eyelash|pupil) is quite high, while p(nose|pupil) is obviously lower. t-SNE models these conditional probabilities as:

The probabilities p(xi|xi) are set to zero, so the previous formula can be extended to the whole graph. In order to solve the problem in an easier way, the conditional probabilities are also symmetrized:

The probability distribution so obtained represents the high-dimensional input relationship. As our goal is to reduce the dimensionality to a value M < N, we can think about a similar probabilistic representation for the target points φi, using a student-t distribution with one degree of freedom:

We want the low-dimensional distribution Q to be as close as possible to the high-dimensional distribution P; therefore, the aim of the t-SNE algorithm is to minimize the Kullback-Leibler pergence between P and Q:

The first term is the entropy of the original distribution P, while the second one is the cross-entropy H(P, Q), which has to be minimized to solve the problem. The best approach is based on a gradient-descent algorithm, but there are also some useful variations that can improve the performance discussed in Visualizing High-Dimensional Data Using t-SNEVan der Maaten L.J.P., Hinton G.E., Journal of Machine Learning Research 9 (Nov), 2008.

主站蜘蛛池模板: 德州市| 祁东县| 沾益县| 台东市| 梅州市| 望江县| 丰都县| 保亭| 洪泽县| 乐陵市| 昭平县| 平昌县| 柞水县| 望都县| 葵青区| 文昌市| 凤凰县| 通山县| 天长市| 平谷区| 嘉峪关市| 哈尔滨市| 井研县| 许昌市| 社旗县| 平湖市| 徐水县| 米脂县| 胶州市| 和林格尔县| 益阳市| 加查县| 峡江县| 琼中| 建瓯市| 贡嘎县| 河东区| 凉山| 稷山县| 邳州市| 鱼台县|