- R for Data Science Cookbook
- Yu Wei Chiu (David Chiu)
- 463字
- 2021-07-14 10:51:30
Merging data
Merging data enables us to understand how different data sources relate to each other. The merge
operation in R is similar to the join
operation in a database, which combines fields from two datasets using values that are common to each.
Getting ready
Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees
and salaries
datasets by following the steps from the Renaming the data variable recipe.
How to do it…
Perform the following steps to merge salaries
and employees
:
- As
employees
andsalaries
are common inemp_no
, we can merge these two datasets usingemp_no
as the join key:> employees_salary <- merge(employees, salaries, by="emp_no") > head(employees_salary,3) emp_no birth_date first_name last_name salary from_date to_date 1 10001 1953-09-02 Georgi Facello 60117 1986-06-26 1987-06-26 2 10001 1953-09-02 Georgi Facello 62102 1987-06-26 1988-06-25 3 10001 1953-09-02 Georgi Facello 66596 1989-06-25 1990-06-25
- Or, we can assign
NULL
to the attribute that we want to drop:> merge(employees, salaries, by="emp_no", all.x =TRUE)
- In addition to the
merge
function, we can install and load theplyr
package to manipulate data:> install.packages("plyr") > library(plyr)
- Besides the standard
merge
function, we can use thejoin
function inplyr
to merge data:> join(employees, salaries, by="emp_no")
How it works…
Similarly to data tables in a database, we sometimes need to combine two datasets for the purpose of correlating data. In R, we can simply combine two different data frames with common values using the merge
function.
In the merge
function, we use both salaries
and employees
as our input data frame. For the by
parameter, we can specify emp_no
as the key to join these two tables. We will then see that the data with the same emp_no
value has now merged into a new data frame. However, sometimes we want to perform either a left join or a right join for the purpose of preserving every data value from either employees or salaries. To perform the left join, we can set all.x
to TRUE
. Then, we can find every row from the employees
dataset preserved in the merged dataset. On the other hand, if one wants to preserve all rows from the salaries
dataset, we can set all.y
to TRUE
.
In addition to using the built-in merge
function, we can install the plyr
package to merge datasets. The usage of join
is very similar to merge
; we only have to specify the data to join and the columns with the common values within the by
parameter.
There's more…
In the plyr
package, we can use the join_all
function to join recursive datasets within a list. Here, we can use join_all
to join the employees
and salaries
datasets by emp_no
:
> join_all(list(employees, salaries), "emp_no")
- DB2 V9權威指南
- 新編Premiere Pro CC從入門到精通
- Oracle Database 12c Security Cookbook
- C語言程序設計案例精粹
- 利用Python進行數據分析(原書第3版)
- Unreal Engine 4 Shaders and Effects Cookbook
- Linux:Embedded Development
- Multithreading in C# 5.0 Cookbook
- Essential C++(中文版)
- Django 3.0應用開發詳解
- Elasticsearch Essentials
- UML2面向對象分析與設計(第2版)
- Python數據可視化之美:專業圖表繪制指南(全彩)
- Application Development with Parse using iOS SDK
- C語言程序設計