書名： Hands-On Data Science with R
作者名： Vitor Bianchi Lanzetta Nataraj Dasgupta Ricardo Anjoleto Farias
本章字數(shù)： 345字
更新時間： 2021-06-10 19:12:32

Useful functions to draw automated summaries

A very standard procedure whenever conducting data analysis with R is to get a glimpse of data. To input head() and tail() functions with a DataFrame is quite common among R users; people tend to use both to check whether data was correctly read. While the latter function will display the last few observations, the former will show you the first ones. That's useful, but not what we're looking for.

There is another function commonly called at the beginning of a data analysis process. It's called summary(). A short demonstration lies ahead:

summary(big_sample)
#    Min. 1st Qu. Median Mean 3rd Qu. Max. 
# -11.317 6.586 9.980 9.971 13.345 32.341

This function works differently depending on what class of object you input it with. For both vectors and DataFrames, it will display central trend measures (median and mean) along with other useful information about how your variables are distributed (minimum value, maximum value, first quartile, and third quartile).

Some packages have similar functions. Let's make sure the psych, Hmisc, and pastecs packages are already installed:

pkgs <- c('psych','Hmisc','pastecs')
pkgs <- pkgs[!(pkgs %in% installed.packages())]
if(length(pkgs) != 0) {install.packages(pkgs)}
rm(pkgs)

Now, we can try some descriptive summaries from these packages:

psych::describe(big_sample)
Hmisc::describe(big_sample)
pastecs::stat.desc(big_sample)

Each of these functions will output a different set of information about data that has been input. I encourage the reader to try them all. Which of them do you like best?

This section has introduced you to some of the most popular measures of central tendency and dispersion. Those are not only used to draw descriptive analysis, but they are also used to handle inferences. It's hard to find any model that won't benefit from mean and variance (and standard deviation) at all.

The average prediction given by the arithmetic mean is usually more accurate than predictions considered individually. This phenomenon is known as Wisdom of the Crowd.

With mean and standard deviation at hand, it's time to move on to inference. The inferences discussed next can be found under an umbrella called statistical hypothesis testing.

官术网_书友最值得收藏!

Hands-On Data Science with R

Useful functions to draw automated summaries