官术网_书友最值得收藏!

Summary

In this chapter, we used several of scikit-learn's methods for building a standard workflow to run and evaluate data mining models. We introduced the Nearest Neighbors algorithm, which is implemented in scikit-learn as an estimator. Using this class is quite easy; first, we call the fit function on our training data, and second, we use the predict function to predict the class of testing samples.

We then looked at pre-processing by fixing poor feature scaling. This was done using a Transformer object and the MinMaxScaler class. These functions also have a fit method and then a transform, which takes data of one form as an input and returns a transformed dataset as an output.

To investigate these transformations further, try swapping out the MinMaxScaler with some of the other mentioned transformers. Which is the most effective and why would this be the case?

Other transformers also exist in scikit-learn, which we will use later in this book, such as PCA. Try some of these out as well, referencing scikit-learn's excellent documentation at https://scikit-learn.org/stable/modules/preprocessing.html

In the next chapter, we will use these concepts in a larger example, predicting the outcome of sports matches using real-world data.

主站蜘蛛池模板: 邛崃市| 镇雄县| 夏津县| 灵宝市| 修水县| 任丘市| 孟州市| 布尔津县| 潮安县| 卢龙县| 涟源市| 太仆寺旗| 永仁县| 林西县| 山东省| 上蔡县| 五莲县| 错那县| 深泽县| 广灵县| 东港市| 嵊泗县| 武夷山市| 汨罗市| 林芝县| 正阳县| 青岛市| 明光市| 饶河县| 新乐市| 香格里拉县| 甘泉县| 法库县| 屯留县| 乡城县| 宁阳县| 正定县| 江永县| 鹿泉市| 沐川县| 潜山县|