After we have converted each data attribute to the proper data type, we may determine that some attributes in employees and salaries are in the date format. Thus, we can calculate the number of years between the employees' date of birth and current year to estimate the age of each employee. Here, we will show you how to use some built-in date functions and the lubridate package to manipulate date format data.
Getting ready
Refer to the previous recipe and convert each attribute of imported data into the correct data type. Also, you have to rename the columns of the employees and salaries datasets by following the steps from the Renaming the data variable recipe.
How to do it…
Perform the following steps to work with the date format in employees and salaries:
We can add or subtract days on the date format attribute using the following:
> employees$hire_date + 30
We can obtain time differences in days between hire_date and birth_date using the following:
> employees$hire_date - employees$birth_dateTime differences in days [1] 11985 7842 9765 11902 12653 13192 11586 13357 [9] 11993 9581
Besides getting time differences in days, we can obtain differences in weeks using the difftime function:
After following the steps in the previous section, both employees data and salaries data should now be renamed, and the data type of each attribute should have already been converted to the proper data type. As some of the attributes are in the date format, we can then use some date functions to calculate the time difference in days between these attributes.
Date type data allows arithmetic operations; we can add or subtract some days from its value. Thus, we first demonstrate that we can add 30 to hire_date. Then, we can check whether 30 more days have been added to all hire dates. Next, we can calculate the time difference in days between the birth_date and hire_date attributes in order to find out the age at which each employee started working at that company. However, the minus operation can only show us the time differences in days; we need to perform more calculations to change the differences in time from days to a different measurement. Thus, we can use the difftime function to determine time differences in a different unit (for example, hours, days, and weeks). While difftime provides more measurement choices, we still need to make some further calculations to obtain the difference in months and years.
To simplify date computation, we can use a convenient lubridate date operation package. As the data is in year-month-date format, we can use the ymd function to convert the data to POSIX format first. Then, we can use an interval function to calculate the time span between hire_date and birth_date. Subsequently, we can use the as.period function to compute the period of the time span. This allows us to use the year function to obtain the number of years between each employee's birthday and hire date.
Finally, to calculate the age of the employee, we can use the now function to obtain the current time. We then use interval to obtain the time interval between the birth date of the employee and the current date. With this information, we can finally use the year function to obtain the actual age of the employee.
There's more…
When using the lubridate package (version 1.3.3), you might find the following error message:
Error in (function (..., deparse.level = 1) : (converted from warning) number of columns of result is not a multiple of vector length (arg 3)
This error message occurs due to a locale configuration bug. You can fix the problem by setting locale to English_United States.1252: