官术网_书友最值得收藏!

  • Hands-On Data Science with R
  • Vitor Bianchi Lanzetta Nataraj Dasgupta Ricardo Anjoleto Farias
  • 376字
  • 2021-06-10 19:12:34

Data Wrangling with R

"You can have data without information, but you cannot have information without data."
                                                                                                      – Daniel Keys Moran

Data wrangling has been one of the core strengths of R, given its capabilities of relatively fast in-memory processing on demand and a wide array of packages that facilitate the fast data curation processes that data wrangling involves.

R is especially invaluable when working with datasets in excess of 1 million rows—the limit in Microsoft Excel—or when working with files that are in the order of gigabytes. Due to several easy-to-use functions for common day-to-day tasks such as aggregations, joins, and pivots, R is also arguably much simpler to use relative to some of the GUI-based tools that are available for similar tasks.

At a high level, the core categories of data wrangling with R include data extraction, data cleansing, data transformation, and data consolidation. This is a simplified categorization of the basic tenets of data wrangling and we'll delve deeper into these individual subject areas in the next few sections. The challenge emanates largely due to the fact that data comes in a range of data types and data formats from a diverse pool of data sources. Here, data type refers to the characteristics of the contents of the files, format refers to the file format in which data is delivered, and source refers to the systems from when you receive data. There is no common universal convention for thesethe data may exist in a CSV file or a binary SAS file or be present in a database, each of which can have its own nuances and challenges.

In this chapter, we will cover the following topics:

  • Introduction to data wrangling with R
  • The foundational tools of data wrangling: dplyr, data.table, and others
  • ETL with R data extraction
  • ETL with R data transformation
  • ETL with R data load
  • Helpful data wrangling tools for everyday use
  • Tutorial
主站蜘蛛池模板: 大名县| 敦化市| 平邑县| 綦江县| 隆林| 西充县| 南京市| 工布江达县| 惠来县| 沽源县| 信丰县| 砚山县| 神木县| 行唐县| 赤峰市| 五莲县| 福泉市| 襄樊市| 临武县| 滨州市| 长乐市| 济宁市| 龙门县| 镇雄县| 昭通市| 正阳县| 柳江县| 南岸区| 微山县| 南陵县| 高碑店市| 河北省| 奈曼旗| 渝中区| 黄石市| 通辽市| 宝兴县| 德化县| 云南省| 旌德县| 怀来县|