官术网_书友最值得收藏!

Chapter 2. Scalable Learning in Scikit-learn

Loading a dataset into memory, preparing a data matrix, training a machine learning algorithm, and testing its generalization capabilities using out-of-sample observations are often not such a big deal given the quite powerful and yet affordable computers of this day and age. However, more and more frequently, the scale of the data to be elaborated is so huge that loading it into the core memory of your computer is not possible and, even if manageable, the result is intractable both in terms of data management and machine learning.

Alternative viable strategies beyond the core memory processing are possible: splitting the data into samples, using parallelism, and finally learning in small batches or by single instances. The present chapter will focus on the out-of-the-box solution that the Scikit-learn package offers: the streaming of mini batches of instances (our observations) from data storage and the incremental learning based on them. Such a solution is called out-of-core learning.

To treat the data by working on manageable chunks and learning incrementally is a great idea. However, when you try to implement it, it can also prove challenging because of the limitations in the available learning algorithms and streaming data in a flow will require you to think differently in terms of data management and feature extraction. Beyond presenting the Scikit-learn functionalities for out-of-core learning, we will also strive to present you with Python solutions for apparently daunting problems you can face when forced to observe only small portions of your data at a time.

In this chapter, we will cover the following topics:

  • The way out-of-core learning is implemented in Scikit-learn
  • Effectively managing streams of data using the hashing trick
  • The nuts and bolts of stochastic learning
  • Implementing data science with online learning
  • Unsupervised transformations of streams of data
主站蜘蛛池模板: 涪陵区| 九龙坡区| 龙南县| 法库县| 周口市| 邢台县| 池州市| 长垣县| 石棉县| 独山县| 巴中市| 临澧县| 黄石市| 翼城县| 乌审旗| 平安县| 个旧市| 济阳县| 郸城县| 张北县| 竹山县| 论坛| 潜江市| 淳化县| 临朐县| 山丹县| 含山县| 博白县| 独山县| 蒙山县| 云林县| 右玉县| 南靖县| 冷水江市| 休宁县| 莱芜市| 永善县| 阜新| 吉水县| 稻城县| 石台县|