官术网_书友最值得收藏!

Statistics

In a given dataset, we try to summarize the data by the central position of the data, which is known as measure of central tendency or summary statistics. There are several ways to measure the central tendency, such as mean, median, and mode. Mean is the widely used measure of central tendency. Under different scenarios, we use different measures of central tendency. Now we are going to give an example of how to compute the different measures of central tendency in R.

Mean

mean is the equal weightage average of the sample. For example, we can compute the mean of Volume in the dataset Sampledata by executing the following code, which gives the arithmetic mean of the volume:

mean(Sampledata$Volume) 

Median

Median is the mid value of the matrix when it is arranged in a sorted way, which can be computed by executing the following code:

median(Sampledata$Volume) 

Mode

Mode is the value present in the attribute which has maximum frequency. For mode, there does not exist an inbuilt function so we will write a function to compute mode:

findmode <- function(x) { 
   uniqx <- unique(x) 
   uniqx[which.max(tabulate(match(x, uniqx)))] 
} 
findmode(Sampledata$return) 

Executing the preceding code gives the mode of the return attribute of the dataset.

Summary

We can also generate basic statistics of a column by executing the following code:

summary(Sampledata$Volume) 

This generates the mean, median, minimum, maximum, Q1, and Q2 quartiles.

Moment

Moment gives the characteristics such as variance, skewness, and so on of the population, which is computed by the following code. The code gives the third order moment of the attribute Volume. Once can change the order to get the relevant characteristics. However before that, we need to install package e1071:

moment(Sampledata$Volume, order=3, center=TRUE) 

Kurtosis

Kurtosis measures whether the data is heavy-tailed or light-tailed relative to a normal distribution. Datasets with high kurtosis tend to have heavy tails, or outliers. Datasets with low kurtosis tend to have light tails, and fewer outliers. The computed value of kurtosis is compared with the kurtosis of normal distribution and the interpretation is made on the basis of that.

The kurtosis of Volume is given by the following code:

kurtosis(Sampledata$Volume) 

It gives value 5.777117, which shows the distribution of volume as leptokurtic.

Skewness

Skewness is the measure of symmetry of the distribution. If the mean of data values is less than the median then the distribution is said to be left-skewed and if the mean of the data values is greater than the median, then the distribution is said to be right-skewed.

The skewness of Volume is computed as follows in R:

skewness(Sampledata$Volume) 

This gives the result 1.723744, which means it is right-skewed.

Note

For computing skewness and kurtosis, we need to install the package e1071.

主站蜘蛛池模板: 绵竹市| 尼木县| 习水县| 洞头县| 灵台县| 定南县| 同仁县| 建德市| 土默特左旗| 磴口县| 南岸区| 河北区| 金华市| 安化县| 广南县| 重庆市| 垫江县| 凤庆县| 阳春市| 克东县| 嘉善县| 沐川县| 桂阳县| 林西县| 两当县| 明溪县| 蒙城县| 南郑县| 宁城县| 定襄县| 建瓯市| 金山区| 临清市| 莲花县| 塔河县| 中超| 海伦市| 海兴县| 曲水县| 措美县| 中卫市|