官术网_书友最值得收藏!

Visualizing data from an external dataset

As a final test for this chapter, let's visualize some data from an external dataset, such as the digits dataset from scikit-learn.

Specifically, We will need three tools for visualization:

  • scikit-learn for the actual data
  • NumPy for data munging
  • Matplotlib

So let's start by importing all of these:

In [1]: import numpy as np
... from sklearn import datasets
... import matplotlib.pyplot as plt
... %matplotlib inline

The first step is to actually load the data:

In [2]: digits = datasets.load_digits()

If we remember correctly, digits is supposed to have two different fields: a data field containing the actual image data, and a target field containing the image labels. Rather than trust our memory, we should simply investigate the digits object. We do this by typing out its name, adding a period, and then hitting the TAB key: digits.<TAB>. This will reveal that the digits object also contains some other fields, such as one called images. The two fields, images and data, seem to simply differ by shape:

In [3]: print(digits.data.shape)
... print(digits.images.shape)
Out[3]: (1797, 64)
... (1797, 8, 8)

In both cases, the first dimension corresponds to the number of images in the dataset. However, data has all the pixels lined up in one big vector, whereas images preserves the 8 x 8 spatial arrangement of each image.

Thus, if we wanted to plot a single image, the images field would be more appropriate. First, we grab a single image from the dataset using NumPy's array slicing:

In [4]: img = digits.images[0, :, :]

Here, we are saying that we want to grab the first row in the 1,797 items-long array and all the corresponding 8 x 8=64 pixels. We can then plot the image using plt's imshow function:

In [5]: plt.imshow(img, cmap='gray')
Out[5]: <matplotlib.image.AxesImage at 0x7efcd27f30f0>

The preceding command gives the following output:

An example image from the digits dataset

In addition, I also specified a color map with the cmap argument. By default, Matplotlib uses MATLAB's default colormap jet. However, in the case of grayscale images, the gray colormap makes more sense.

Finally, we can plot a whole number of digit samples using plt's subplot function. The subplot function is the same as in MATLAB, where we specify the number of rows, number of columns, and current subplot index (starts counting at 1). We will use for loop to iterate over the first ten images in the dataset and every image gets assigned its own subplot:

In [6]: for image_index in range(10):
... # images are 0-indexed, but subplots are 1-indexed
... subplot_index = image_index + 1
... plt.subplot(2, 5, subplot_index)
... plt.imshow(digits.images[image_index, :, :], cmap='gray')

This leads to the following output:

Ten example images from the digits database
Another great resource for all sorts of datasets is the Machine Learning Repository of my A lma Mater, the University of California, Irvine:
http://archive.ics.uci.edu/ml.
主站蜘蛛池模板: 阿巴嘎旗| 黔西县| 光山县| 商城县| 固镇县| 克什克腾旗| 齐齐哈尔市| 宁夏| 兰州市| 宁乡县| 香河县| 南平市| 尼玛县| 黑龙江省| 子洲县| 大名县| 嘉荫县| 邹平县| 都匀市| 四平市| 青岛市| 满城县| 吕梁市| 庆元县| 寿宁县| 天门市| 宣汉县| 安龙县| 山东省| 鄂伦春自治旗| 宁国市| 岳西县| 康平县| 泽库县| 黄龙县| 铜梁县| 新平| 绿春县| 信阳市| 基隆市| 甘洛县|