官术网_书友最值得收藏!

Scatter plot to show clusters

While we have seen a scatter plot of random dots before, scatter plots are most useful with representing discrete data points that show a trend or clusters. Each data series will be plotted in different color per plotting command by default, which helps us distinguish the distinct dots from each series. To demonstrate the idea, we will generate two artificial clusters of data points using a simple random number generator function in NumPy, shown as follows:

import matplotlib.pyplot as plt

# seed the random number generator to keep results reproducible
np.random.seed(123)

# Generate 10 random numbers around 2 as x-coordinates of the first data series
x0 = np.random.rand(10)+1.5

# Generate the y-coordinates another data series similarly
np.random.seed(321)
y0 = np.random.rand(10)+2
np.random.seed(456)
x1 = np.random.rand(10)+2.5
np.random.seed(789)
y1 = np.random.rand(10)+2
plt.scatter(x0,y0)
plt.scatter(x1,y1)

plt.show()

We can see from the following plot that there are two artificially created clusters of dots colored blue (approximately in the left half) and orange (approximately in the right half) respectively:

There is another way to generate clusters and plot them on scatter plots. We can generate clusters of data points for testing and demonstration more directly using the make_blobs() function in a package called sklearn, which is developed for more advanced data analysis and data mining, as shown in the following snippet. We can specify the colors according to the assigned feature (cluster identity):

import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs

# make blobs with 3 centers with random seed of 23
blob_coords,features = make_blobs(centers=3, random_state=23)

# plot the blobs, with c value set to map colors to features
plt.scatter(blob_coords[:, 0], blob_coords[:, 1], marker='x', c=features)
plt.show()

As the make_blob function generates dots based on isotropic Gaussian distribution, we can see from the resultant plot that they are better clustered as three distinct blobs of dots centralized at three points:

scikit-learn is a powerful Python package that provides many different simple functions for data mining and data analysis. It has a versatile suite of algorithms for classification, regression, clustering, dimension reduction, and modeling. It also allows data preprocessing and pipelining multiple processing stages.

For getting familiar with the scikit-learn library, we can use the datasets preloaded in package, such as the famous iris petals, or generate datasets according to the specified distribution as shown before. Here we'll only use the data to illustrate a simple visualization using scatter plot, and we won't go into the details just yet. More examples are available and can be found by clicking on this link:

http://scikit-learn.org/stable/auto_examples/datasets/plot_random_dataset.html 

The previous example is a demonstration of an easier way to map dot color with labeled features, if any. The details of make_blobs() and other scikit-learn functions are out of our scope of introducing basic plots in this chapter.

We encourage our readers to seek resources and books on our Mapt platform to get a better understanding of the scikit-learn library usage. 

 Alternatively, readers can also read the scikit-learn documentation here: http://scikit-learn.org.

主站蜘蛛池模板: 盱眙县| 临沧市| 宜春市| 宜春市| 健康| 凤翔县| 加查县| 聊城市| 曲周县| 临澧县| 樟树市| 德阳市| 安国市| 吉水县| 巴南区| 滦平县| 萍乡市| 都江堰市| 霸州市| 枞阳县| 武宁县| 方城县| 通江县| 丰原市| 乌拉特后旗| 土默特左旗| 象州县| 八宿县| 江川县| 南安市| 新巴尔虎左旗| 东至县| 敖汉旗| 星座| 若尔盖县| 隆尧县| 焉耆| 泗洪县| 通化县| 织金县| 铁岭市|