捕鱼达人破解版游戏无限金币版

書名： R for Data Science Cookbook
作者名： Yu Wei Chiu (David Chiu)
本章字數： 445字
更新時間： 2021-07-14 10:51:28

Converting data types

If we do not specify a data type during the import phase, R will automatically assign a type to the imported dataset. However, if the data type assigned is different to the actual type, we may face difficulties in further data manipulation. Thus, data type conversion is an essential step during the preprocessing phase.

Getting ready

Complete the previous recipe and import both employees.csv and salaries.csv into an R session. You must also specify column names for these two datasets to be able to perform the following steps.

How to do it…

Perform the following steps to convert the data type:

First, examine the data type of each attribute using the class function:
```
> class(employees$birth_date)
[1] "factor"
```

You can also examine types of all attributes using the str function:

> str(employees)

'data.frame': 10 obs. of 6 variables:
 $ emp_no : int 10001 10002 10003 10004 10005 10006 10007 10008 10009 10010
 $ birth_date: Factor w/ 10 levels "1952-04-19","1953-04-20",..: 3 10 8 4 5 2 6 7 1 9
 $ first_name: Factor w/ 10 levels "Anneke","Bezalel",..: 5 2 7 3 6 1 10 8 9 4
 $ last_name : Factor w/ 10 levels "Bamford","Facello",..: 2 9 1 4 5 8 10 3 6 7
 $ gender : Factor w/ 2 levels "F","M": 2 1 2 2 2 1 1 2 1 1
 $ hire_date : Factor w/ 10 levels "1985-02-18","1985-11-21",..: 3 2 4 5 9 7 6 10 1 8

Then, you need to convert both birth_date and hired_date to the date format:

> employees$birth_date <- as.Date(employees$birth_date)
> employees$hire_date <- as.Date(employees$hire_date)

You also need to convert both first_name and last_name into character type:

> employees$first_name <- as.character(employees$first_name)
> employees$last_name <- as.character(employees$last_name)

Again, you can use str to examine the dataset:

> str(employees)

'data.frame': 10 obs. of 6 variables:
 $ emp_no : int 10001 10002 10003 10004 10005 10006 10007 10008 10009 10010
 $ birth_date: Date, format: "1953-09-02" ...
 $ first_name: chr "Georgi" "Bezalel" "Parto" "Chirstian" ...
 $ last_name : chr "Facello" "Simmel" "Bamford" "Koblick" ...
 $ gender : Factor w/ 2 levels "F","M": 2 1 2 2 2 1 1 2 1 1
 $ hire_date : Date, format: "1986-06-26" ...

Furthermore, you can convert the data type of from_date and to_date to date type within salaries:

> salaries$from_date <- as.Date(salaries$from_date)
> salaries$to_date <- as.Date(salaries$to_date)

How it works…

In this recipe, we demonstrated how to convert the data type of each attribute within the dataset. Before conducting further conversion on any attribute, you must first examine the current type of each attribute. To identify the data type, you can use the class function to determine the data-selecting attribute. Furthermore, to inspect all data types, you can use the str function.

From the output of applying the str function to the employees data frame, we can see that both birth_date and hire_date are in factor type. However, if we need to calculate one's age with the birth_date attribute, we need to convert it to date format. Thus, we change both birth_date and hire_date to date format using the as.Date function.

Also, as the factor type limits the choice of values in one attribute, we may not freely add a record to the dataset. As it is hard to find exactly the same last name and first name from the dataset, we need to convert last_name and first_name to the character type. We can then proceed to append a new record to the employees dataset in the next recipe. Finally, we should also convert from_date and to_date of the salaries dataset to date type, and we can then perform date calculations in the next recipe.

There's more…

Besides using an as function to convert data type, you can specify the data type during the data import phase. Using the read.csv function as an example, you can specify the data type in the colClasses argument. If you want R to automatically select the data type (that is, automatically convert emp_no to integer type), simply specify NA within colClasses:

> employees <- read.csv('~/Desktop/employees.csv', colClasses = c(NA,"Date", "character", "character", "factor", "Date"), head=FALSE)
> str(employees)
'data.frame': 10 obs. of 6 variables:
 $ V1: int 10001 10002 10003 10004 10005 10006 10007 10008 10009 10010
 $ V2: Date, format: "1953-09-02" ...
 $ V3: chr "Georgi" "Bezalel" "Parto" "Chirstian" ...
 $ V4: chr "Facello" "Simmel" "Bamford" "Koblick" ...
 $ V5: Factor w/ 2 levels "F","M": 2 1 2 2 2 1 1 2 1 1
 $ V6: Date, format: "1986-06-26" ...

By specifying the colClasses argument, emp_no, birth_date, first_name, last_name, gender, and hire_date will be converted into integer type, date type, character type, character type, factor type, and date type respectively.

官术网_书友最值得收藏!

R for Data Science Cookbook

Converting data types

Getting ready

How to do it…

How it works…

There's more…