官术网_书友最值得收藏!

Chapter 2. R and High-Performance Computing

The software environment R (R Development Core Team, 2015) is nowadays the most commonly used software in the statistical world, and this software is heavily used in this book. The methods described in any of the following chapters are practically applied, and the application of the methods is shown using the statistical environment R. For a book on simulation and data science in R, and to efficiently apply methods, a longer R introduction is needed, especially on features that support efficient calculations.

In this chapter, you will be given a very brief introduction to the functionality of R. This introduction does not replace a general introduction to R but instead shows some useful points, such as introducing modern visualization tools and efficient data manipulation packages. These topics — among others from this chapter — are important for understanding the examples and the R code in the book.

More important than replicating a fully comprehensive R introduction would be to cover some aspects related to computer-intensive methods and expensive data simulation in data science. Thus, some packages and methods are introduced that are suitable to work efficiently with large data sets or can be efficiently applied in simulations.

Since data manipulation is always a central point in every analysis and data scientists probably spend more than 70 percent of their work in data manipulation (before applying statistical methods), we will concentrate on the packages dplyr (Wickham and Francois, 2015) and data.table (Dowle et al., 2015).

At the end of this chapter, we will discuss packages for high-performance computing (for example, package snow, Tierney et al., 2015) and useful profiling tools.

Tip

Other important issues such as creating our own R packages, integrated tests, and dynamic reporting are not part of the contents of this book. However, experienced R users should make use of these important features, and it is suggested that you read specialized literature on these topics.

Experts in R may skip this chapter and immediately start with Chapter 3, The Discrepancy Between Pencil-Driven Theory and Data-Driven Computational Solutions. Newbies in R should also read an introduction to R next to or before reading this chapter.

主站蜘蛛池模板: 商水县| 宿松县| 宁远县| 德州市| 庆云县| 乡城县| 桐庐县| 河南省| 长汀县| 罗源县| 沛县| 嫩江县| 台湾省| 扶余县| 喜德县| 阆中市| 孝感市| 伊金霍洛旗| 嘉祥县| 五常市| 田阳县| 明光市| 龙里县| 巩留县| 松溪县| 根河市| 攀枝花市| 朝阳县| 上高县| 石嘴山市| 新巴尔虎左旗| 固原市| 宝坻区| 响水县| 广河县| 北安市| 宜城市| 钦州市| 泽普县| 敖汉旗| 大城县|