官术网_书友最值得收藏!

  • MATLAB for Machine Learning
  • Giuseppe Ciaburro
  • 284字
  • 2021-07-02 19:37:36

Dimensionality reduction

Dimensionality reduction is the process of converting a set of data with many variables into data with lesser dimensions but ensuring similar information. It can help improve model accuracy and performance, improve interpretability, and prevent overfitting. The Statistics and Machine Learning Toolbox includes many algorithms and functions for reducing the dimensionality of our datasets. It can be divided into feature selection and feature extraction. Feature selection approaches try to find a subset of the original variables. Feature extraction reduces the dimensionality in the data by transforming data into new features.

As already mentioned, feature selection finds only the subset of measured features (predictor variables) that give the best predictive performance in modeling the data. The Statistics and Machine Learning Toolbox includes many feature selection methods, as follows:

  • Stepwise regression: Adds or removes features until there is no improvement in prediction accuracy. Especially suited for linear regression or generalized linear regression algorithms.
  • Sequential feature selection: Equivalent to stepwise regression, this can be applied with any supervised learning algorithm.
  • Selecting features for classifying high-dimensional data.
  • Boosted and bagged decision trees: Calculate the variable's importance from out-of-bag errors.
  • Regularization: Remove redundant features by reducing their weights to zero.

Otherwise, feature extraction transforms existing features into new features (predictor variables) where less-descriptive features can be ignored.

The Statistics and Machine Learning Toolbox includes many feature extraction methods, as follows:

  • PCA: This can be applied to summarize data in fewer dimensions by projection onto a unique orthogonal basis
  • Non-negative matrix factorization: This can be applied when model terms must represent non-negative quantities
  • Factor analysis: This can be applied to build explanatory models of data correlations

The following are step-wise regression example charts:

Figure 1.20: Step-wise regression example
主站蜘蛛池模板: 株洲市| 孝义市| 黔南| 张家界市| 西华县| 南京市| 台州市| 柳林县| 曲松县| 洛宁县| 武威市| 昌都县| 沿河| 于都县| 马边| 松桃| 闵行区| 封丘县| 碌曲县| 西乌| 天水市| 巴东县| 罗田县| 永德县| 宕昌县| 岚皋县| 迁安市| 丹江口市| 房山区| 丹东市| 通化市| 北海市| 连城县| 都安| 汽车| 尚义县| 滕州市| 沛县| 巧家县| 张家口市| 库伦旗|