官术网_书友最值得收藏!

Variance

As we saw in the first example, the mean isn't sufficient to describe non-homogeneous or very dispersed samples.

In order to add a unique value describing how dispersed the sample set's values are, we need to look at the concept of variance, which needs the mean of the sample set as a starting point, and then averages the distances of the samples from the provided mean. The greater the variance, the more scattered the sample set.

The canonical definition of variance is as follows:

Let's write the following sample code snippet to illustrate this concept, adopting the previously used libraries. For the sake of clarity, we are repeating the declaration of the mean function:

    import math #This library is needed for the power operation 
def mean(sampleset): #Definition header for the mean function
total=0
for element in sampleset:
total=total+element
return total/len(sampleset)

def variance(sampleset): #Definition header for the mean function
total=0
setmean=mean(sampleset)
for element in sampleset:
total=total+(math.pow(element-setmean,2))
return total/len(sampleset)

myset1=[2.,10.,3.,6.,4.,6.,10.] #We create the data set
myset2=[1.,-100.,15.,-100.,21.]
print "Variance of first set:" + str(variance(myset1))
print "Variance of second set:" + str(variance(myset2))

The preceding code will generate the following output:

    Variance of first set:8.69387755102
Variance of second set:3070.64

As you can see, the variance of the second set was much higher, given the really dispersed values. The fact that we are computing the mean of the squared distance helps to really outline the differences, as it is a quadratic operation.

主站蜘蛛池模板: 江口县| 东乡族自治县| 醴陵市| 宾川县| 福建省| 左云县| 宜兴市| 农安县| 大同市| 阿尔山市| 资阳市| 荥经县| 巫山县| 特克斯县| 自贡市| 平南县| 赤壁市| 依兰县| 武强县| 凉城县| 治县。| 循化| 如皋市| 普宁市| 衡水市| 浦城县| 改则县| 桃园县| 望奎县| 虎林市| 怀安县| 双辽市| 瓦房店市| 容城县| 绥中县| 榆中县| 绥滨县| 云安县| 湟中县| 盐亭县| 龙里县|