官术网_书友最值得收藏!

Summary

In this chapter, we have seen how learning is possible out-of-core by streaming data, no matter how big it is, from a text file or database on your hard disk. These methods certainly apply to much bigger datasets than the examples that we used to demonstrate them (which actually could be solved in-memory using non-average, powerful hardware).

We also explained the core algorithm that makes out-of-core learning possible—SGD—and we examined its strength and weakness, emphasizing the necessity of streams to be really stochastic (which means in a random order) to be really effective, unless the order is part of the learning objectives. In particular, we introduced the Scikit-learn implementation of SGD, limiting our focus to the linear and logistic regression loss functions.

Finally, we discussed data preparation, introduced the hashing trick and validation strategies for streams, and wrapped up the acquired knowledge on SGD fitting two different models—classification and regression.

In the next chapter, we will keep on enriching our out-of-core capabilities by figuring out how to enable non-linearity in our learning schema and hinge loss for support vector machines. We will also present alternatives to Scikit-learn, such as Liblinear, Vowpal Wabbit, and StreamSVM. Although operating as external shell commands, all of them could be easily wrapped and controlled by Python scripts.

主站蜘蛛池模板: 外汇| 罗田县| 兴国县| 张家港市| 西吉县| 洞头县| 安龙县| 楚雄市| 阳谷县| 凤翔县| 舒兰市| 宁阳县| 徐水县| 穆棱市| 天长市| 如皋市| 密山市| 长春市| 冀州市| 永清县| 福贡县| 荥经县| 夹江县| 新宁县| 怀远县| 河曲县| 祥云县| 星子县| 漠河县| 庆安县| 买车| 社旗县| 正蓝旗| 贺兰县| 安陆市| 梨树县| 琼海市| 武汉市| 永清县| 眉山市| 温宿县|