官术网_书友最值得收藏!

Datasets and modeling

We're going to be using two of the prior datasets, the simulated data from Chapter 4Advanced Feature Selection in Linear Models, and the customer satisfaction data from Chapter 3, Logistic Regression. We'll start by building a classification tree on the simulated data. This will help us to understand the basic principles of tree-based methods. Then, we'll move on to random forest and boosted trees applied to the customer satisfaction data. This exercise will provide an excellent comparison to the generalized linear models from before. Finally, I want to show you an interesting feature selection method using random forest, using the simulated data. By interesting, I mean it's a valuable technique to add to your feature selection arsenal, but I'll point out a couple of caveats for you to consider in practical application.

主站蜘蛛池模板: 芜湖市| 基隆市| 文安县| 枣庄市| 兴国县| 武夷山市| 安阳县| 崇明县| 武强县| 南江县| 吉林市| 沙雅县| 鹤峰县| 龙岩市| 泸州市| 万载县| 漳平市| 惠水县| 娄烦县| 拜泉县| 遵义县| 五寨县| 金溪县| 衡阳县| 遂溪县| 九龙坡区| 台北市| 鹿泉市| 肇州县| 宁国市| 周宁县| 平和县| 安乡县| 四会市| 厦门市| 宜都市| 白玉县| 共和县| 大兴区| 安康市| 临洮县|