不朽的情缘五滴血视频

書名： R for Data Science Cookbook
作者名： Yu Wei Chiu (David Chiu)
本章字數： 463字
更新時間： 2021-07-14 10:51:30

Merging data

Merging data enables us to understand how different data sources relate to each other. The merge operation in R is similar to the join operation in a database, which combines fields from two datasets using values that are common to each.

Getting ready

Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees and salaries datasets by following the steps from the Renaming the data variable recipe.

How to do it…

Perform the following steps to merge salaries and employees:

As employees and salaries are common in emp_no, we can merge these two datasets using emp_no as the join key:

> employees_salary <- merge(employees, salaries, by="emp_no")
> head(employees_salary,3)
 emp_no birth_date first_name last_name salary from_date to_date
1 10001 1953-09-02 Georgi Facello 60117 1986-06-26 1987-06-26
2 10001 1953-09-02 Georgi Facello 62102 1987-06-26 1988-06-25
3 10001 1953-09-02 Georgi Facello 66596 1989-06-25 1990-06-25

Or, we can assign NULL to the attribute that we want to drop:
```
> merge(employees, salaries, by="emp_no", all.x =TRUE)
```
In addition to the merge function, we can install and load the plyr package to manipulate data:
```
> install.packages("plyr")
> library(plyr)
```
Besides the standard merge function, we can use the join function in plyr to merge data:
```
> join(employees, salaries, by="emp_no")
```

How it works…

Similarly to data tables in a database, we sometimes need to combine two datasets for the purpose of correlating data. In R, we can simply combine two different data frames with common values using the merge function.

In the merge function, we use both salaries and employees as our input data frame. For the by parameter, we can specify emp_no as the key to join these two tables. We will then see that the data with the same emp_no value has now merged into a new data frame. However, sometimes we want to perform either a left join or a right join for the purpose of preserving every data value from either employees or salaries. To perform the left join, we can set all.x to TRUE. Then, we can find every row from the employees dataset preserved in the merged dataset. On the other hand, if one wants to preserve all rows from the salaries dataset, we can set all.y to TRUE.

In addition to using the built-in merge function, we can install the plyr package to merge datasets. The usage of join is very similar to merge; we only have to specify the data to join and the columns with the common values within the by parameter.

There's more…

In the plyr package, we can use the join_all function to join recursive datasets within a list. Here, we can use join_all to join the employees and salaries datasets by emp_no:

> join_all(list(employees, salaries), "emp_no")

官术网_书友最值得收藏!

R for Data Science Cookbook

Merging data

Getting ready

How to do it…

How it works…

There's more…