官术网_书友最值得收藏!

Dropping data

In the previous recipes, we introduced how to revise and filter datasets. Following these steps almost concludes the data preprocessing and preparation phase. However, we may still find some bad data within our dataset. Thus, we should discard this bad data or unwanted records to prevent it from generating misleading results. Here, we introduce some practical methods to remove this unnecessary data.

Getting ready

Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees and salaries datasets by following the steps from the Renaming the data variable recipe.

How to do it…

Perform the following steps to drop an attribute from the current dataset:

  1. First, you can drop the last_name column by excluding last_name in our filtered subset:
    > employees <- employees[,-5]
    
  2. Or, you can assign NULL to the attribute you wish to drop:
    > employees$hire_date <- NULL
    
  3. To drop rows, you can specify the index of the row that you want to drop by assigning a negative index:
    > employees <- employees[c(-2,-4,-6),]
    

How it works…

The idea of dropping rows is very similar to data filtering; you only need to specify the negative index of rows (or columns) that you want to drop during the filtering. Then, you can replace the original dataset with the filtered subset. Thus, as the last_name column is at the fifth index, you can remove the attribute by specifying -5 at the right-hand side of the comma within the square bracket. In addition to reassignment, you can also assign NULL to the attribute that you want to drop. As for removing rows, you can place negative indexes on the left-hand side of comma within the square bracket, and then replace the original dataset with the filtered subset.

There's more…

In addition to data filtering or assigning the specific attribute to NULL, you can use the within function to remove unwanted attributes. All you need to do is place the unwanted attribute names inside the rm function:

> within(employees, rm(birth_date, hire_date))
 emp_no first_name last_name gender
1 10001 Georgi Facello M
2 10002 Bezalel Simmel F
3 10003 Parto Bamford M
4 10004 Chirstian Koblick M
5 10005 Kyoichi Maliniak M
6 10006 Anneke Preusig F
7 10007 Tzvetan Zielinski F
8 10008 Saniya Kalloufi M
9 10009 Sumant Peac F
10 10010 Duangkaew Piveteau F
主站蜘蛛池模板: 南城县| 武城县| 抚宁县| 叶城县| 安国市| 城固县| 万载县| 延长县| 汽车| 天柱县| 惠安县| 历史| 株洲县| 积石山| 广汉市| 大田县| 小金县| 达拉特旗| 五华县| 玉溪市| 平罗县| 巴林左旗| 许昌县| 轮台县| 湘乡市| 广安市| 太原市| 凤阳县| 南溪县| 清新县| 鹤庆县| 临武县| 昭平县| 威信县| 云安县| 遂平县| 苏州市| 铁岭县| 梧州市| 锦州市| 凉城县|