- Large Scale Machine Learning with Python
- Bastiaan Sjardin Luca Massaron Alberto Boschetti
- 219字
- 2021-07-14 10:39:49
Summary
In this chapter, we have seen how learning is possible out-of-core by streaming data, no matter how big it is, from a text file or database on your hard disk. These methods certainly apply to much bigger datasets than the examples that we used to demonstrate them (which actually could be solved in-memory using non-average, powerful hardware).
We also explained the core algorithm that makes out-of-core learning possible—SGD—and we examined its strength and weakness, emphasizing the necessity of streams to be really stochastic (which means in a random order) to be really effective, unless the order is part of the learning objectives. In particular, we introduced the Scikit-learn implementation of SGD, limiting our focus to the linear and logistic regression loss functions.
Finally, we discussed data preparation, introduced the hashing trick and validation strategies for streams, and wrapped up the acquired knowledge on SGD fitting two different models—classification and regression.
In the next chapter, we will keep on enriching our out-of-core capabilities by figuring out how to enable non-linearity in our learning schema and hinge loss for support vector machines. We will also present alternatives to Scikit-learn, such as Liblinear, Vowpal Wabbit, and StreamSVM. Although operating as external shell commands, all of them could be easily wrapped and controlled by Python scripts.
- Intel FPGA/CPLD設(shè)計(jì)(基礎(chǔ)篇)
- Cortex-M3 + μC/OS-II嵌入式系統(tǒng)開發(fā)入門與應(yīng)用
- 新型電腦主板關(guān)鍵電路維修圖冊
- 顯卡維修知識精解
- 現(xiàn)代辦公設(shè)備使用與維護(hù)
- 計(jì)算機(jī)應(yīng)用與維護(hù)基礎(chǔ)教程
- 單片機(jī)開發(fā)與典型工程項(xiàng)目實(shí)例詳解
- 筆記本電腦維修實(shí)踐教程
- 單片機(jī)技術(shù)及應(yīng)用
- LPC1100系列處理器原理及應(yīng)用
- 無蘋果不生活:OS X Mountain Lion 隨身寶典
- 單片微機(jī)原理及應(yīng)用
- 微控制器的應(yīng)用
- 筆記本電腦維修技能實(shí)訓(xùn)
- USB應(yīng)用開發(fā)寶典