- Mastering Python for Data Science
- Samir Madhavan
- 512字
- 2021-07-16 20:14:20
A confidence interval
A confidence interval is a type of interval statistics for a population parameter. The confidence interval helps in determining the interval at which the population mean can be defined.

Let's try to understand this concept by using an example. Let's take the height of every man in Kenya and determine with 95% confidence interval the average of height of Kenyan men at a national level.
Let's take 50 men and their height in centimeters:
>>> height_data = np.array([ 186.0, 180.0, 195.0, 189.0, 191.0, 177.0, 161.0, 177.0, 192.0, 182.0, 185.0, 192.0, 173.0, 172.0, 191.0, 184.0, 193.0, 182.0, 190.0, 185.0, 181.0, 188.0, 179.0, 188.0, 170.0, 179.0, 180.0, 189.0, 188.0, 185.0, 170.0, 197.0, 187.0, 182.0, 173.0, 179.0, 184.0, 177.0, 190.0, 174.0, 203.0, 206.0, 173.0, 169.0, 178.0, 201.0, 198.0, 166.0, 171.0, 180.0])
Plotting the distribution, it has a normal distribution:
>>> plt.hist(height_data, 30, normed=True) >>> plt.show()

The mean of the distribution is as follows:
>>> height_data.mean() 183.24000000000001
So, the average height of a man from the sample is 183.4 cm.
To determine the confidence interval, we'll now define the standard error of the mean.
The standard error of the mean is the deviation of the sample mean from the population mean. It is defined using the following formula:

Here, s is the standard deviation of the sample, and n is the number of elements of the sample.
This can be calculated using the sem()
function of the SciPy package:
>>> stats.sem(height_data) 1.3787187190005252
So, there is a standard error of the mean of 1.38 cm. The lower and upper limit of the confidence interval can be determined by using the following formula:
Upper/Lower limit = mean(height) + / - sigma * SEmean(x)
For lower limit:
183.24 + (1.96 * 1.38) = 185.94
For upper limit:
183.24 - (1.96*1.38) = 180.53
A 1.96 standard deviation covers 95% of area in the normal distribution.
We can confidently say that the population mean lies between 180.53 cm and 185.94 cm of height.

Let's assume we take a sample of 50 people, record their height, and then repeat this process 30 times. We can then plot the averages of each sample and observe the distribution.

The commands that simulated the preceding plot is as follows:
>>> average_height = [] >>> for i in xrange(30): >>> sample50 = np.random.normal(183, 10, 50).round() >>> average_height.append(sample50.mean()) >>> plt.hist(average_height, 20, normed=True) >>> plt.show()
You can observe that the mean ranges from 180 to 187 cm when we simulated the average height of 50 sample men, which was taken 30 times.
Let's see what happens when we sample 1000 men and repeat the process 30 times:
>>> average_height = [] >>> for i in xrange(30): >>> sample1000 = np.random.normal(183, 10, 1000).round() >>> average_height.append(sample1000.mean()) >>> plt.hist(average_height, 10, normed=True) >>> plt.show()

As you can see, the height varies from 182.4 cm and to 183.4 cm. What does this mean?
It means that as the sample size increases, the standard error of the mean decreases, which also means that the confidence interval becomes narrower, and we can tell with certainty the interval that the population mean would lie on.
- Java異步編程實(shí)戰(zhàn)
- Debian 7:System Administration Best Practices
- Unity 2020 Mobile Game Development
- Machine Learning with R Cookbook(Second Edition)
- Learn Programming in Python with Cody Jackson
- Hands-On C++ Game Animation Programming
- 實(shí)戰(zhàn)Java高并發(fā)程序設(shè)計(jì)(第3版)
- Python全棧數(shù)據(jù)工程師養(yǎng)成攻略(視頻講解版)
- Programming with CodeIgniterMVC
- 分布式架構(gòu)原理與實(shí)踐
- 軟件工程與UML案例解析(第三版)
- HTML5游戲開(kāi)發(fā)實(shí)戰(zhàn)
- 安卓工程師教你玩轉(zhuǎn)Android
- Android智能手機(jī)APP界面設(shè)計(jì)實(shí)戰(zhàn)教程
- Isomorphic Go