- R for Data Science Cookbook
- Yu Wei Chiu (David Chiu)
- 514字
- 2021-07-14 10:51:29
Adding new records
For those of you familiar with databases, you may already know how to perform an insert
operation to append a new record to the dataset. Alternatively, you can use an alter
operation to add a new column (attribute) into a table. In R, you can also perform insert
and alter
operations but much more easily. We will introduce the rbind
and cbind
function in this recipe so that you can easily append a new record or new attribute to the current dataset with R.
Getting ready
Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees
and salaries
datasets by following the steps from the Renaming the data variable recipe.
How to do it…
Perform the following steps to add a new record or new variable into the dataset:
- First, use
rbind
to insert a new record toemployees
:> employees <- rbind(employees, c(10011, '1960-01-01', 'Jhon', 'Doe', 'M', '1988-01-01'))
- We can then reassign the combined results of the data frame
employees
and new records back toemployees
:> employees <- rbind(employees, c(10011, '1960-01-01', 'Jhon', 'Doe', 'M', '1988-01-01'))
- Besides adding a new record to the original dataset, we can add a new
position
attribute withNA
as the default value:> cbind(employees, position = NA)
- Furthermore, we can add a new
age
attribute, based on a calculation using the current date andbirth_date
of each employee:> span <- interval(ymd(employees$birth_date), now()) > time_period <- as.period(span) > employees$age <- year(time_period)
- Alternatively, we can use the
transform
function to add multiple variables:> transform(employees, age = year(time_period), position = "RD", marrital = NA)
How it works…
Similar to database operations, we can add a new record to the data frame by the schema of the dataset (the number of attributes and data type of each attribute). Here, we first introduced how to use the rbind
function to add a new record to a data frame. As the employees dataset consists of six columns, we can add a record with six values to the employees
dataset with the rbind
function. In the first column, emp_no
is in integer format. Thus, we do not have to wrap the input value with single quotes. For the first_name
and last_name
attributes, we can freely input any character string as a value because we already converted their type to character type. For the last gender
attribute, which is in factor type, we can only input either M
or F
as a value.
In addition to adding a new record to a target dataset, we can add a new variable with the cbind
function. To add a new variable, we can assign a variable with a default value while calling cbind
. Here, we use NA
as the default value for a new position variable. We can also assign the calculated results from other columns as the value of the new variable. In this demonstration, we first computed each employee's age from the current date to their birthday. Then, we used the dollar sign to assign the computed value to a new attribute, age
. Besides using the dollar sign to assign a new variable, we can use the transform function to create age
, position
, and marital
variables in the employees
dataset.
There's more…
Besides using the dollar sign and transform function, we can use the with
function to create new variables:
> with(employees, year(birth_date)) [1] 1953 1964 1959 1954 1955 1953 1957 1958 1952 1963 > employees $birth_year <- with(employees, year(birth_date))
- Visual C++程序設計教程
- PyQt從入門到精通
- C語言程序設計基礎與實驗指導
- Spring Boot+Spring Cloud+Vue+Element項目實戰:手把手教你開發權限管理系統
- HTML5 Mobile Development Cookbook
- Apache Mahout Clustering Designs
- Getting Started with Eclipse Juno
- ArcGIS for Desktop Cookbook
- ScratchJr趣味編程動手玩:讓孩子用編程講故事
- C++程序設計
- 進入IT企業必讀的324個Java面試題
- Android移動應用項目化教程
- Apache Solr PHP Integration
- Game Development Patterns and Best Practices
- Python從入門到項目實踐(超值版)