官术网_书友最值得收藏!

Basic summary statistics

Practitioners in the field of descriptive analytics use a set of four summary statistics to quickly understand a dataset. With practice, you should be able to strengthen your intuition about each one of these statistical measurements. In fact, it's a great place to start with most problem statements that you will face. The four summary statistics are described as follows: 

  • Locations: The location or center of the data; this can be measured by the mean (average), median, or mode. The median is the point of delineation in 50% of the data, and the mode is the most occurring points, or largest part of the distribution. 
  • Spread: How the data is spread around the center; this can be measured with standard deviation, which sums the average distance from the mean of each data point, or variance, which is the square of the deviation. 
  • Shape: A description of where the center of distribution sits in relation to the mean. This is usually expressed as the skew direction. You can refer to the following diagram for a negative skew example. In the case of positive skew, the tail is simply pointed in the opposite direction. 
  • Correlation: The measurement of dependency of one variable against another. The most common measure is the Pearson correlation coefficient, which is between -1 (a full negative correlation) and +1 (a full positive correlation). A value of 0 signifies no correlation; this is usually denoted with "r". 

Take a look at the following diagram for a visualization of the points described in this section:

主站蜘蛛池模板: 东莞市| 莒南县| 宁安市| 玉山县| 阿图什市| 洛川县| 大洼县| 南岸区| 登封市| 微山县| 西昌市| 隆德县| 当雄县| 汕头市| 江都市| 应用必备| 板桥市| 诏安县| 徐汇区| 社会| 五指山市| 新兴县| 东城区| 偃师市| 元谋县| 蚌埠市| 德保县| 将乐县| 元谋县| 聂拉木县| 海林市| 泰州市| 博野县| 巴林左旗| 沂水县| 朝阳市| 泾阳县| 会东县| 肥乡县| 石嘴山市| 武宁县|