- R for Data Science Cookbook
- Yu Wei Chiu (David Chiu)
- 198字
- 2021-07-14 10:51:29
Dropping data
In the previous recipes, we introduced how to revise and filter datasets. Following these steps almost concludes the data preprocessing and preparation phase. However, we may still find some bad data within our dataset. Thus, we should discard this bad data or unwanted records to prevent it from generating misleading results. Here, we introduce some practical methods to remove this unnecessary data.
Getting ready
Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees
and salaries
datasets by following the steps from the Renaming the data variable recipe.
How to do it…
Perform the following steps to drop an attribute from the current dataset:
- First, you can drop the
last_name
column by excludinglast_name
in our filtered subset:> employees <- employees[,-5]
- Or, you can assign
NULL
to the attribute you wish to drop:> employees$hire_date <- NULL
- To drop rows, you can specify the index of the row that you want to drop by assigning a negative index:
> employees <- employees[c(-2,-4,-6),]
How it works…
The idea of dropping rows is very similar to data filtering; you only need to specify the negative index of rows (or columns) that you want to drop during the filtering. Then, you can replace the original dataset with the filtered subset. Thus, as the last_name
column is at the fifth index, you can remove the attribute by specifying -5
at the right-hand side of the comma within the square bracket. In addition to reassignment, you can also assign NULL
to the attribute that you want to drop. As for removing rows, you can place negative indexes on the left-hand side of comma within the square bracket, and then replace the original dataset with the filtered subset.
There's more…
In addition to data filtering or assigning the specific attribute to NULL
, you can use the within
function to remove unwanted attributes. All you need to do is place the unwanted attribute names inside the rm
function:
> within(employees, rm(birth_date, hire_date)) emp_no first_name last_name gender 1 10001 Georgi Facello M 2 10002 Bezalel Simmel F 3 10003 Parto Bamford M 4 10004 Chirstian Koblick M 5 10005 Kyoichi Maliniak M 6 10006 Anneke Preusig F 7 10007 Tzvetan Zielinski F 8 10008 Saniya Kalloufi M 9 10009 Sumant Peac F 10 10010 Duangkaew Piveteau F
- Implementing Modern DevOps
- Visual Basic程序開發(fā)(學(xué)習(xí)筆記)
- Docker進(jìn)階與實戰(zhàn)
- 樂高機(jī)器人設(shè)計技巧:EV3結(jié)構(gòu)設(shè)計與編程指導(dǎo)
- 匯編語言程序設(shè)計(第2版)
- 差分進(jìn)化算法及其高維多目標(biāo)優(yōu)化應(yīng)用
- 零基礎(chǔ)學(xué)單片機(jī)C語言程序設(shè)計
- jQuery Mobile移動應(yīng)用開發(fā)實戰(zhàn)(第3版)
- 第一行代碼 C語言(視頻講解版)
- ASP.NET開發(fā)與應(yīng)用教程
- Java圖像處理:基于OpenCV與JVM
- Android應(yīng)用開發(fā)實戰(zhàn)(第2版)
- Java程序設(shè)計實用教程(第2版)
- Java高手是怎樣煉成的:原理、方法與實踐
- LabVIEW入門與實戰(zhàn)開發(fā)100例(第4版)