官术网_书友最值得收藏!

Dimensionality reduction

Feature reduction (or feature selection) or dimensionality reduction is the process of reducing the input set of independent variables to obtain a lesser number of variables that are really required by the model to predict the target.

In certain cases, it is possible to represent multiple dependent variables by combining them together without losing much information. For example, instead of having two independent variables such as the length of a rectangle and the breath of a rectangle, the dimensions can be represented by only one variable called the area that represents both the length and breadth of the rectangle.

The following mentioned are the multiple reasons we need to perform a dimensionality reduction on a given input dataset:

  • To aid data compression, therefore accommodate the data in a smaller amount of disk space.
  • The time to process the data is reduced as fewer dimensions are used to represent the data.
  • It removes redundant features from datasets. Redundant features are typically known as multicollinearity in data.
  • Reducing the data to fewer dimensions helps visualize the data through graphs and charts.
  • Dimensionality reduction removes noisy features from the dataset which, in turn, improves the model performance.

There are many ways by which dimensionality reduction can be attained in a dataset. The use of filters, such as information gain filters, and symmetric attribute evaluation filters, is one way. Genetic-algorithm-based selection and principal component analysis (PCA) are other popular techniques used to achieve dimensionality reduction. Hybrid methods do exist to attain feature selection.

主站蜘蛛池模板: 巴林右旗| 治县。| 冷水江市| 广丰县| 嘉善县| 海晏县| 府谷县| 牙克石市| 同心县| 五华县| 张家口市| 砚山县| 浦东新区| 沙田区| 高青县| 锡林浩特市| 桃园县| 永修县| 恭城| 商水县| 防城港市| 西吉县| 清原| 平谷区| 乐业县| 乌鲁木齐市| 通榆县| 信丰县| 枣阳市| 阿拉善右旗| 洮南市| 营山县| 乌拉特后旗| 开平市| 文水县| 孟津县| 泰安市| 宁阳县| 苍南县| 五莲县| 富川|