官术网_书友最值得收藏!

Manifold assumption

This is the less intuitive assumption, but it can be extremely useful to reduce the complexity of many problems. First of all, we can provide a non-rigorous definition of a manifold. An n-manifold is a topological space that is globally curved, but locally homeomorphic to an n-dimensional Euclidean space. In the following diagram, there's an example of a manifold: the surface of a sphere in ?3:

2D manifold obtained from a spherical surface

The small patch around P (for ε → 0) can be mapped to a flat circular surface. Therefore, the properties of a manifold are locally based on the Euclidean geometry, while, globally, they need a proper mathematical extension which is beyond the scope of this book (further information can be found in Semi-supervised learning on Riemannian manifolds, Belkin M., Niyogi P., Machine Learning 56, 2004).

The manifold assumption states that p-dimensional samples (where p >> 1) approximately lie on a q-dimensional manifold with p << q. Without excessive mathematical rigor, we can say that, for example, if we have N 1000-dimensional bounded vectors, they are enclosed into a 1000-dimensional hypercube with edge-length equal to r. The corresponding n-volume is rp = r1000, therefore, the probability of filling the entire space is very small (and decreases with p). What we observe, instead, is a high density on a lower dimensional manifold. For example, if we look at the Earth from space, we might think that its inhabitants are uniformly distributed over the whole volume. We know that this is false and, in fact, we can create maps and atlases which are represented on two-dimensional manifolds. It doesn't make sense to use three-dimensional vectors to map the position of a human being. It's easier to use a projection and work with latitude and longitude.

This assumption authorizes us to apply dimensionality reduction methods in order to avoid the Curse of Dimensionality, theorized by Bellman (in Dynamic Programming and Markov Process, Ronald A. Howard, The MIT Press). In the scope of machine learning, the main consequence of such an effect is that when the dimensionality of the samples increases, in order to achieve a high accuracy, it's necessary to use more and more samples. Moreover, Hughes observed (the phenomenon has been named after him and it's presented in the paper Hughes G. F., On the mean accuracy of statistical pattern recognizers, IEEE Transactions on Information Theory, 1968, 14/1) that the accuracy of statistical classifiers is inversely proportional to the dimensionality of the samples. This means that whenever it's possible to work on lower dimensional manifolds (in particular in semi-supervised scenarios), two advantages are achieved:

  • Less computational time and memory consumption
  • Higher classification accuracy
主站蜘蛛池模板: 辽中县| 白朗县| 靖边县| 若尔盖县| 肇东市| 昌吉市| 砀山县| 丹寨县| 尖扎县| 德钦县| 湟中县| 龙里县| 衡南县| 江安县| 和静县| 乐山市| 桑植县| 浮梁县| 五常市| 留坝县| 连州市| 洞口县| 新宾| 商城县| 淄博市| 卢氏县| 故城县| 道孚县| 图木舒克市| 扬州市| 繁峙县| 湖北省| 定边县| 通榆县| 东莞市| 贡嘎县| 济宁市| 临洮县| 凭祥市| 酉阳| 青岛市|