官术网_书友最值得收藏!

Summary

In this chapter, we extended our use of scikit-learn's classifiers to perform classification and introduced the pandaslibrary to manage our data. We analyzed real-world data on basketball results from the NBA, saw some of the problems that even well-curated data introduces, and created new features for our analysis.

We saw the effect that good features have on performance and used an ensemble algorithm, random forests, to further improve the accuracy. To take these concepts further, try to create your own features and test them out. Which features perform better? If you have trouble coming up with features, think about what other datasets can be included. For example, if key players are injured, this might affect the results of a specific match and cause a better team to lose.

In the next chapter, we will extend the affinity analysis that we performed in the first chapter to create a program to find similar books. We will see how to use algorithms for ranking and also use an approximation to improve the scalability of data mining.

主站蜘蛛池模板: 西畴县| 池州市| 云阳县| 安阳市| 北碚区| 东兴市| 恩施市| 闸北区| 石家庄市| 富锦市| 洛宁县| 华坪县| 慈溪市| 武川县| 昌图县| 夹江县| 襄城县| 元朗区| 蒲城县| 德保县| 广南县| 南宫市| 常山县| 施秉县| 广水市| 沁阳市| 神农架林区| 龙口市| 朝阳县| 专栏| 曲水县| 孟连| 蛟河市| 土默特右旗| 桂平市| 化隆| 济南市| 太湖县| 兴国县| 辽阳县| 郧西县|