官术网_书友最值得收藏!

Measuring variance

We usually refer to variance as sigma squared, and you'll find out why momentarily, but for now, just know that variance is the average of the squared differences from the mean.

  1. To compute the variance of a dataset, you first figure out the mean of it. Let's say I have some data that could represent anything. Let's say maximum number of people that were standing in line for a given hour. In the first hour, I observed 1 person standing in line, then 4, then 5, then 4, then 8.
  2. The first step in computing the variance is just to find the mean, or the average, of that data. I add them all, divide the sum by the number of data points, and that comes out to 4.4 which is the average number of people standing in line (1+4+5+4+8)/5 = 4.4.
  3. Now the next step is to find the differences from the mean for each data point. I know that the mean is 4.4. So for my first data point, I have 1, so 1 - 4.4 = -3.4, The next data point is 4, so 4 - 4.4 = -0.4 4 - 4.4 = -0.4, and so on and so forth. OK, so I end up with these both positive and negative numbers that represent the variance from the mean for each data point (-3.4, -0.4, 0.6, -0.4, 3.6).
  4. Now what I need is a single number that represents the variance of this entire dataset. So, the next thing I'm going to do is find the square of these differences. I'm just going to go through each one of those raw differences from the mean and square them. This is for a couple of different reasons:
    • First, I want to make sure that negative variances. Count just as much as positive variances. Otherwise, they will cancel each other out. That'd be bad.
    • Second, I also want to give more weight to the outliers, so this amplifies the effect of things that are very different from the mean while still, making sure that the negatives and positives are comparable (11.56, 0.16, 0.36, 0.16, 12.96).

Let's look at what happens there, so (-3.4)2 is a positive 11.56 and (-0.4)2 ends up being a much smaller number, that is 0.16, because that's much closer to the mean of 4.4. Also (0.6)2 turned out to be close to the mean, only 0.36. But as we get up to the positive outlier, (3.6)2 ends up being 12.96. That gives us: (11.56, 0.16, 0.36, 0.16, 12.96).

To find the actual variance value, we just take the average of all those squared differences. So we add up all these squared variances, divide the sum by 5, that is number of values that we have, and we end up with a variance of 5.04.

OK, that's all variance is.

主站蜘蛛池模板: 鄂托克前旗| 皮山县| 临沂市| 南宫市| 嘉定区| 潢川县| 枝江市| 无为县| 曲阳县| 武强县| 类乌齐县| 玛沁县| 黔西县| 石渠县| 荆门市| 新竹市| 博兴县| 宁河县| 安图县| 壤塘县| 辽阳市| 广宁县| 定州市| 岑溪市| 扎鲁特旗| 河北区| 青浦区| 长汀县| 从江县| 北辰区| 常山县| 兴宁市| 保靖县| 昔阳县| 温州市| 宁都县| 泸溪县| 柳江县| 华宁县| 清远市| 阿拉善左旗|