官术网_书友最值得收藏!

  • Machine Learning Algorithms
  • Giuseppe Bonaccorso
  • 224字
  • 2021-07-02 18:53:29

scikit-learn toy datasets

scikit-learn provides some built-in datasets that can be used for testing purposes. They're all available in the package sklearn.datasets and have a common structure: the data instance variable contains the whole input set X while target contains the labels for classification or target values for regression. For example, considering the Boston house pricing dataset (used for regression), we have:

from sklearn.datasets import load_boston

>>> boston = load_boston()
>>> X = boston.data
>>> Y = boston.target

>>> X.shape
(506, 13)
>>> Y.shape
(506,)

In this case, we have 506 samples with 13 features and a single target value. In this book, we're going to use it for regressions and the MNIST handwritten digit dataset (load_digits()) for classification tasks. scikit-learn also provides functions for creating dummy datasets from scratch: make_classification(), make_regression(), and make_blobs() (particularly useful for testing cluster algorithms). They're very easy to use and in many cases, it's the best choice to test a model without loading more complex datasets.

Visit http://scikit-learn.org/stable/datasets/ for further information.
The MNIST dataset provided by scikit-learn is limited for obvious reasons. If you want to experiment with the original one, refer to the website managed by Y. LeCun, C. Cortes, C. Burges: http://yann.lecun.com/exdb/mnist/. Here you can download a full version made up of 70,000 handwritten digits already split into training and test sets.
主站蜘蛛池模板: 洛扎县| 商洛市| 普定县| 龙口市| 阿拉尔市| 普格县| 亚东县| 嘉义县| 磐石市| 温泉县| 长沙市| 桐乡市| 株洲市| 白山市| 沁水县| 封开县| 乐平市| 台中县| 平原县| 遂平县| 平陆县| 普宁市| 阿克| 丰顺县| 宁远县| 潞城市| 隆尧县| 宁蒗| 青冈县| 桐梓县| 蓝山县| 洛隆县| 芦溪县| 三原县| 江永县| 金门县| 营口市| 三都| 渝北区| 贵阳市| 丽江市|