- Hands-On Data Science with R
- Vitor Bianchi Lanzetta Nataraj Dasgupta Ricardo Anjoleto Farias
- 345字
- 2021-06-10 19:12:32
Useful functions to draw automated summaries
A very standard procedure whenever conducting data analysis with R is to get a glimpse of data. To input head() and tail() functions with a DataFrame is quite common among R users; people tend to use both to check whether data was correctly read. While the latter function will display the last few observations, the former will show you the first ones. That's useful, but not what we're looking for.
There is another function commonly called at the beginning of a data analysis process. It's called summary(). A short demonstration lies ahead:
summary(big_sample)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -11.317 6.586 9.980 9.971 13.345 32.341
This function works differently depending on what class of object you input it with. For both vectors and DataFrames, it will display central trend measures (median and mean) along with other useful information about how your variables are distributed (minimum value, maximum value, first quartile, and third quartile).
Some packages have similar functions. Let's make sure the psych, Hmisc, and pastecs packages are already installed:
pkgs <- c('psych','Hmisc','pastecs')
pkgs <- pkgs[!(pkgs %in% installed.packages())]
if(length(pkgs) != 0) {install.packages(pkgs)}
rm(pkgs)
Now, we can try some descriptive summaries from these packages:
psych::describe(big_sample)
Hmisc::describe(big_sample)
pastecs::stat.desc(big_sample)
Each of these functions will output a different set of information about data that has been input. I encourage the reader to try them all. Which of them do you like best?
This section has introduced you to some of the most popular measures of central tendency and dispersion. Those are not only used to draw descriptive analysis, but they are also used to handle inferences. It's hard to find any model that won't benefit from mean and variance (and standard deviation) at all.
With mean and standard deviation at hand, it's time to move on to inference. The inferences discussed next can be found under an umbrella called statistical hypothesis testing.
- 嵌入式系統(tǒng)及其開發(fā)應(yīng)用
- 網(wǎng)上生活必備
- 水晶石精粹:3ds max & ZBrush三維數(shù)字靜幀藝術(shù)
- Visual Basic.NET程序設(shè)計
- SAP Business Intelligence Quick Start Guide
- 嵌入式操作系統(tǒng)原理及應(yīng)用
- 水晶石影視動畫精粹:After Effects & Nuke 影視后期合成
- 中文版AutoCAD 2013高手速成
- 空間機器人
- 工業(yè)機器人操作
- Hands-On DevOps
- 大數(shù)據(jù)時代的調(diào)查師
- 細節(jié)決定交互設(shè)計的成敗
- 網(wǎng)絡(luò)設(shè)備規(guī)劃、配置與管理大全(Cisco版)
- Ubuntu 9 Linux應(yīng)用基礎(chǔ)