官术网_书友最值得收藏!

Adding new records

For those of you familiar with databases, you may already know how to perform an insert operation to append a new record to the dataset. Alternatively, you can use an alter operation to add a new column (attribute) into a table. In R, you can also perform insert and alter operations but much more easily. We will introduce the rbind and cbind function in this recipe so that you can easily append a new record or new attribute to the current dataset with R.

Getting ready

Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees and salaries datasets by following the steps from the Renaming the data variable recipe.

How to do it…

Perform the following steps to add a new record or new variable into the dataset:

  1. First, use rbind to insert a new record to employees:
    > employees <- rbind(employees, c(10011, '1960-01-01', 'Jhon', 'Doe', 'M', '1988-01-01'))
    
  2. We can then reassign the combined results of the data frame employees and new records back to employees:
    > employees <- rbind(employees, c(10011, '1960-01-01', 'Jhon', 'Doe', 'M', '1988-01-01'))
    
  3. Besides adding a new record to the original dataset, we can add a new position attribute with NA as the default value:
    > cbind(employees, position = NA)
    
  4. Furthermore, we can add a new age attribute, based on a calculation using the current date and birth_date of each employee:
    > span <- interval(ymd(employees$birth_date), now())
    > time_period <- as.period(span)
    > employees$age <- year(time_period)
    
  5. Alternatively, we can use the transform function to add multiple variables:
    > transform(employees, age = year(time_period), position = "RD", marrital = NA)
    

How it works…

Similar to database operations, we can add a new record to the data frame by the schema of the dataset (the number of attributes and data type of each attribute). Here, we first introduced how to use the rbind function to add a new record to a data frame. As the employees dataset consists of six columns, we can add a record with six values to the employees dataset with the rbind function. In the first column, emp_no is in integer format. Thus, we do not have to wrap the input value with single quotes. For the first_name and last_name attributes, we can freely input any character string as a value because we already converted their type to character type. For the last gender attribute, which is in factor type, we can only input either M or F as a value.

In addition to adding a new record to a target dataset, we can add a new variable with the cbind function. To add a new variable, we can assign a variable with a default value while calling cbind. Here, we use NA as the default value for a new position variable. We can also assign the calculated results from other columns as the value of the new variable. In this demonstration, we first computed each employee's age from the current date to their birthday. Then, we used the dollar sign to assign the computed value to a new attribute, age. Besides using the dollar sign to assign a new variable, we can use the transform function to create age, position, and marital variables in the employees dataset.

There's more…

Besides using the dollar sign and transform function, we can use the with function to create new variables:

> with(employees, year(birth_date))
 [1] 1953 1964 1959 1954 1955 1953 1957 1958 1952 1963
> employees $birth_year <- with(employees, year(birth_date))
主站蜘蛛池模板: 剑川县| 灵武市| 萝北县| 沽源县| 偃师市| 扬州市| 饶阳县| 三原县| 河南省| 绥棱县| 锦州市| 合阳县| 乌海市| 娄底市| 景宁| 邵东县| 香港 | 盐源县| 平顶山市| 柘荣县| 冀州市| 天祝| 漳平市| 德安县| 怀柔区| 武穴市| 刚察县| 常山县| 巴青县| 乌兰察布市| 莲花县| 菏泽市| 柏乡县| 临漳县| 隆回县| 岗巴县| 内江市| 深圳市| 东阳市| 甘孜县| 合江县|