孕妇可以看捕鱼机吗

書名： Python Data Analysis
作者名： Ivan Idris
本章字數： 529字
更新時間： 2021-08-05 17:31:54

Basic descriptive statistics with NumPy

In this book, we will try to use as many varied datasets as possible. This depends on the availability of the data. Unfortunately, this means that the subject of the data might not exactly match your interests. Every dataset has its own quirks, but the general skills you acquire in this book should transfer to your own field. In this chapter, we will load a number of Comma-separated Value (CSV) files into NumPy arrays in order to analyze the data.

To load the data, we will use the NumPy loadtxt() function as follows:

Note

The code for this example can be found in basic_stats.py in the code bundle.

import numpy as np
from scipy.stats import scoreatpercentile

data = np.loadtxt("mdrtb_2012.csv", delimiter=',', usecols=(1,), skiprows=1, unpack=True)

print "Max method", data.max()
print "Max function", np.max(data)

print "Min method", data.min()
print "Min function", np.min(data)

print "Mean method", data.mean()
print "Mean function", np.mean(data)

print "Std method", data.std()
print "Std function", np.std(data)

print "Median", np.median(data)
print "Score at percentile 50", scoreatpercentile(data, 50)

Next, we will compute the mean, median, maximum, minimum, and standard deviations of a NumPy array.

Note

If these terms sound unfamiliar to you, please take some time to learn about them from Wikipedia or any other source. As mentioned in the Preface, we will assume familiarity with basic mathematical and statistical concepts.

The data comes from the mdrtb_2012.csv file, which can be found in the code bundle. This is an edited version of the CSV file, which can be downloaded from the WHO website at https://extranet.who.int/tme/generateCSV.asp?ds=mdr_estimates. It contains data about a type of tuberculosis. The file we are going to use is a reduced version of the original file containing only two columns: the country and percentage of new cases. Here are the first two lines of the file:

country,e_new_mdr_pcnt
Afghanistan,3.5

Now, let's compute the mean, median, maximum, minimum, and standard deviations of a NumPy array:

First, we will load the data with the following function call:
```
data = np.loadtxt("mdrtb_2012.csv", delimiter=',', usecols=(1,), skiprows=1, unpack=True)
```
In the preceding call, we specify a comma as a delimiter, the second column to load data from, and that we want to skip the header. We also specify the name of the file and assume that the file is in the current directory; otherwise, we will have to specify the correct path.

The maximum of an array can be obtained via a method of the ndarray and NumPy functions. The same goes for the minimum, mean, and standard deviations. The following code snippet prints the various statistics:

print "Max method", data.max()
print "Max function", np.max(data)

print "Min method", data.min()
print "Min function", np.min(data)

print "Mean method", data.mean()
print "Mean function", np.mean(data)

print "Std method", data.std()
print "Std function", np.std(data)

The output is as follows:

Max method 50.0
Max function 50.0
Min method 0.0
Min function 0.0
Mean method 3.2787037037
Mean function 3.2787037037
Std method 5.76332073654
Std function 5.76332073654

The median can be retrieved with a NumPy or SciPy function, which can estimate the 50th percentile of the data with the following lines:
```
print "Median", np.median(data)
print "Score at percentile 50", scoreatpercentile(data, 50)
```
The following is printed:
```
Median 1.8
Score at percentile 50 1.8
```

官术网_书友最值得收藏!

Python Data Analysis

Basic descriptive statistics with NumPy

Note

Note