官术网_书友最值得收藏!

Introduction

Some studies estimate that data preparation activities account for 80 percent of the time invested in data science projects.

I know you will not be surprised reading this number. Data preparation is the phase in data science projects where you take your data from the chaotic world around you and fit it into some precise structures and standards.

This is absolutely not a simple task and involves a great number of techniques that basically let you change the structure of your data and ensure you can work with it.

This chapter will show you recipes that should give you the ability to prepare the data you got from the previous chapter, no matter how it was structured when you acquired it in R.

We will look at the two main activities performed during the data preparation phase:

  • Data cleansing: This involves identification and treatment of outliers and missing values
  • Data manipulation: Here, the main aim is to make the data structure fit some specific rule, which will let the user employ it for analysis
主站蜘蛛池模板: 灌云县| 资源县| 井研县| 临潭县| 马龙县| 玉树县| 古丈县| 高陵县| 西藏| 平邑县| 高安市| 灵寿县| 遵义县| 富宁县| 马尔康县| 门头沟区| 新源县| 东丰县| 那坡县| 乌拉特前旗| 淮阳县| 迁西县| 辛集市| 松阳县| 丁青县| 石棉县| 绥中县| 土默特左旗| 西乌| 玉龙| 英吉沙县| 宝清县| 淮南市| 喀喇沁旗| 五寨县| 商南县| 上林县| 阜南县| 格尔木市| 锦州市| 肥乡县|