官术网_书友最值得收藏!

Useful functions to draw automated summaries

A very standard procedure whenever conducting data analysis with R is to get a glimpse of data. To input head() and tail() functions with a DataFrame is quite common among R users; people tend to use both to check whether data was correctly read. While the latter function will display the last few observations, the former will show you the first ones. That's useful, but not what we're looking for.

There is another function commonly called at the beginning of a data analysis process. It's called summary(). A short demonstration lies ahead:

summary(big_sample)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -11.317 6.586 9.980 9.971 13.345 32.341

This function works differently depending on what class of object you input it with. For both vectors and DataFrames, it will display central trend measures (median and mean) along with other useful information about how your variables are distributed (minimum value, maximum value, first quartile, and third quartile).

Some packages have similar functions. Let's make sure the psych, Hmisc, and pastecs packages are already installed: 

pkgs <- c('psych','Hmisc','pastecs')
pkgs <- pkgs[!(pkgs %in% installed.packages())]
if(length(pkgs) != 0) {install.packages(pkgs)}
rm(pkgs)

Now, we can try some descriptive summaries from these packages:

psych::describe(big_sample)
Hmisc::describe(big_sample)
pastecs::stat.desc(big_sample)

Each of these functions will output a different set of information about data that has been input. I encourage the reader to try them all. Which of them do you like best?

This section has introduced you to some of the most popular measures of central tendency and dispersion. Those are not only used to draw descriptive analysis, but they are also used to handle inferences. It's hard to find any model that won't benefit from mean and variance (and standard deviation) at all.

The average prediction given by the arithmetic mean is usually more accurate than predictions considered individually. This phenomenon is known as Wisdom of the Crowd.

With mean and standard deviation at hand, it's time to move on to inference. The inferences discussed next can be found under an umbrella called statistical hypothesis testing.

主站蜘蛛池模板: 临沭县| 米易县| 蒲江县| 中方县| 贵南县| 黔南| 进贤县| 钟山县| 都匀市| 织金县| 德钦县| 屏边| 固安县| 黑龙江省| 宜良县| 洛隆县| 疏附县| 专栏| 元氏县| 房产| 嘉禾县| 东阿县| 会东县| 贵溪市| 锦州市| 迁西县| 沈丘县| 故城县| 罗田县| 凤台县| 木里| 虞城县| 密山市| 衡山县| 南和县| 光山县| 汝阳县| 德州市| 邵东县| 绵竹市| 清涧县|