- Machine Learning for Developers
- Rodolfo Bonnin
- 237字
- 2021-07-02 15:46:46
Variance
As we saw in the first example, the mean isn't sufficient to describe non-homogeneous or very dispersed samples.
In order to add a unique value describing how dispersed the sample set's values are, we need to look at the concept of variance, which needs the mean of the sample set as a starting point, and then averages the distances of the samples from the provided mean. The greater the variance, the more scattered the sample set.
The canonical definition of variance is as follows:

Let's write the following sample code snippet to illustrate this concept, adopting the previously used libraries. For the sake of clarity, we are repeating the declaration of the mean function:
import math #This library is needed for the power operation
def mean(sampleset): #Definition header for the mean function
total=0
for element in sampleset:
total=total+element
return total/len(sampleset)
def variance(sampleset): #Definition header for the mean function
total=0
setmean=mean(sampleset)
for element in sampleset:
total=total+(math.pow(element-setmean,2))
return total/len(sampleset)
myset1=[2.,10.,3.,6.,4.,6.,10.] #We create the data set
myset2=[1.,-100.,15.,-100.,21.]
print "Variance of first set:" + str(variance(myset1))
print "Variance of second set:" + str(variance(myset2))
The preceding code will generate the following output:
Variance of first set:8.69387755102
Variance of second set:3070.64
As you can see, the variance of the second set was much higher, given the really dispersed values. The fact that we are computing the mean of the squared distance helps to really outline the differences, as it is a quadratic operation.
- Learning Python Web Penetration Testing
- 案例式C語言程序設計
- UML和模式應用(原書第3版)
- MySQL 8從入門到精通(視頻教學版)
- Python爬蟲開發:從入門到實戰(微課版)
- 構建移動網站與APP:HTML 5移動開發入門與實戰(跨平臺移動開發叢書)
- Mastering matplotlib
- Learn Programming in Python with Cody Jackson
- HTML5游戲開發案例教程
- 高級C/C++編譯技術(典藏版)
- 深度學習:算法入門與Keras編程實踐
- Oracle從入門到精通(第5版)
- Creating Stunning Dashboards with QlikView
- Orleans:構建高性能分布式Actor服務
- 零基礎輕松學C++:青少年趣味編程(全彩版)