官术网_书友最值得收藏!

Chapter 5. Dimension Reduction

As described in the Assessing a model/overfitting section of Chapter 2, Data Pipelines, the indiscriminative reliance of a large number of features may cause overfitting; the model may become so tightly coupled with the training set that different validation sets will generate a vastly different outcome and quality metrics such as AuROC.

Dimension reduction techniques alleviate these problems by detecting features that have little influence on the overall model behavior.

This chapter introduces three categories of dimension reduction techniques with two implementations in Scala:

  • Divergence with an implementation of the Kullback-Leibler distance
  • Principal components analysis
  • Estimation of low dimension feature space for nonlinear models

Other types of methodologies used to reduce the number of features such as regularization or singular value decomposition are discussed in future chapters.

But first, let's start our investigation by defining the problem.

主站蜘蛛池模板: 辉县市| 湟源县| 邢台县| 海兴县| 绥棱县| 杭州市| 志丹县| 新竹市| 江北区| 嘉祥县| 班戈县| 长汀县| 和政县| 英超| 灯塔市| 临武县| 金阳县| 贵南县| 宁夏| 定南县| 正安县| 平顶山市| 贡觉县| 绥江县| 桑植县| 临安市| 汤阴县| 青州市| 瑞安市| 玉树县| 济源市| 平定县| 和静县| 张家界市| 汨罗市| 页游| 许昌县| 维西| 西峡县| 乌鲁木齐县| 濉溪县|