- R for Data Science Cookbook
- Yu Wei Chiu (David Chiu)
- 445字
- 2021-07-14 10:51:28
Converting data types
If we do not specify a data type during the import phase, R will automatically assign a type to the imported dataset. However, if the data type assigned is different to the actual type, we may face difficulties in further data manipulation. Thus, data type conversion is an essential step during the preprocessing phase.
Getting ready
Complete the previous recipe and import both employees.csv
and salaries.csv
into an R session. You must also specify column names for these two datasets to be able to perform the following steps.
How to do it…
Perform the following steps to convert the data type:
- First, examine the data type of each attribute using the
class
function:> class(employees$birth_date) [1] "factor"
- You can also examine types of all attributes using the
str
function:> str(employees) 'data.frame': 10 obs. of 6 variables: $ emp_no : int 10001 10002 10003 10004 10005 10006 10007 10008 10009 10010 $ birth_date: Factor w/ 10 levels "1952-04-19","1953-04-20",..: 3 10 8 4 5 2 6 7 1 9 $ first_name: Factor w/ 10 levels "Anneke","Bezalel",..: 5 2 7 3 6 1 10 8 9 4 $ last_name : Factor w/ 10 levels "Bamford","Facello",..: 2 9 1 4 5 8 10 3 6 7 $ gender : Factor w/ 2 levels "F","M": 2 1 2 2 2 1 1 2 1 1 $ hire_date : Factor w/ 10 levels "1985-02-18","1985-11-21",..: 3 2 4 5 9 7 6 10 1 8
- Then, you need to convert both
birth_date
andhired_date
to the date format:> employees$birth_date <- as.Date(employees$birth_date) > employees$hire_date <- as.Date(employees$hire_date)
- You also need to convert both
first_name
andlast_name
into character type:> employees$first_name <- as.character(employees$first_name) > employees$last_name <- as.character(employees$last_name)
- Again, you can use
str
to examine the dataset:> str(employees) 'data.frame': 10 obs. of 6 variables: $ emp_no : int 10001 10002 10003 10004 10005 10006 10007 10008 10009 10010 $ birth_date: Date, format: "1953-09-02" ... $ first_name: chr "Georgi" "Bezalel" "Parto" "Chirstian" ... $ last_name : chr "Facello" "Simmel" "Bamford" "Koblick" ... $ gender : Factor w/ 2 levels "F","M": 2 1 2 2 2 1 1 2 1 1 $ hire_date : Date, format: "1986-06-26" ...
- Furthermore, you can convert the data type of
from_date
andto_date
to date type withinsalaries
:> salaries$from_date <- as.Date(salaries$from_date) > salaries$to_date <- as.Date(salaries$to_date)
How it works…
In this recipe, we demonstrated how to convert the data type of each attribute within the dataset. Before conducting further conversion on any attribute, you must first examine the current type of each attribute. To identify the data type, you can use the class
function to determine the data-selecting attribute. Furthermore, to inspect all data types, you can use the str
function.
From the output of applying the str
function to the employees
data frame, we can see that both birth_date
and hire_date
are in factor type. However, if we need to calculate one's age with the birth_date
attribute, we need to convert it to date format. Thus, we change both birth_date
and hire_date
to date format using the as.Date
function.
Also, as the factor type limits the choice of values in one attribute, we may not freely add a record to the dataset. As it is hard to find exactly the same last name and first name from the dataset, we need to convert last_name
and first_name
to the character type. We can then proceed to append a new record to the employees
dataset in the next recipe. Finally, we should also convert from_date
and to_date
of the salaries dataset to date type, and we can then perform date calculations in the next recipe.
There's more…
Besides using an as function to convert data type, you can specify the data type during the data import phase. Using the read.csv
function as an example, you can specify the data type in the colClasses
argument. If you want R to automatically select the data type (that is, automatically convert emp_no
to integer type), simply specify NA
within colClasses
:
> employees <- read.csv('~/Desktop/employees.csv', colClasses = c(NA,"Date", "character", "character", "factor", "Date"), head=FALSE) > str(employees) 'data.frame': 10 obs. of 6 variables: $ V1: int 10001 10002 10003 10004 10005 10006 10007 10008 10009 10010 $ V2: Date, format: "1953-09-02" ... $ V3: chr "Georgi" "Bezalel" "Parto" "Chirstian" ... $ V4: chr "Facello" "Simmel" "Bamford" "Koblick" ... $ V5: Factor w/ 2 levels "F","M": 2 1 2 2 2 1 1 2 1 1 $ V6: Date, format: "1986-06-26" ...
By specifying the colClasses
argument, emp_no
, birth_date
, first_name
, last_name
, gender
, and hire_date
will be converted into integer type, date type, character type, character type, factor type, and date type respectively.
- Spring Boot 2實戰之旅
- PWA入門與實踐
- Effective C#:改善C#代碼的50個有效方法(原書第3版)
- Building a RESTful Web Service with Spring
- JavaScript+jQuery開發實戰
- Object-Oriented JavaScript(Second Edition)
- Learning ArcGIS Pro
- iOS應用逆向工程(第2版)
- 西門子S7-200 SMART PLC編程從入門到實踐
- Service Mesh實戰:基于Linkerd和Kubernetes的微服務實踐
- Visual FoxPro 6.0程序設計
- Hadoop大數據分析技術
- Continuous Delivery and DevOps:A Quickstart Guide Second Edition
- 數據結構:Python語言描述
- Mastering Unreal Engine 4.X