官术网_书友最值得收藏!

Data Preparation

The Data Preparation phase covers all activities to construct the final dataset (the data that will be fed into the modeling tool(s)) from the initial raw data. Data Preparation is often described as the most labor-intensive phase for the data analyst. It is terribly important that Data Preparation is done well, and a substantial amount of this book is dedicated to it. We cover cleaning, selecting, integrating, and constructing data, in Chapter 5Cleaning and Selecting Data; Chapter 6, Combining Data Files; and Chapter 7, Deriving New Fields, respectively. However, a book dedicated to the basics of data mining can really only start you on your journey when it comes to Data Preparation, since there are so many ways in which you can improve and prepare data. When you are ready for a more advanced treatment of this topic, there are two resources that will go into Data Preparation in much more depth, and both have extensive Modeler software examples: The IBM SPSS Modeler Cookbook (Packt Publishing) and Effective Data Preparation (Cambridge University Press).

The five Data Preparation tasks are:

  • Select data
  • Clean data
  • Construct data
  • Integrate data
  • Format data
主站蜘蛛池模板: 江安县| 临夏县| 武山县| 镇雄县| 乌什县| 苍梧县| 陈巴尔虎旗| 丰城市| 丰原市| 平原县| 合阳县| 沛县| 清镇市| 云霄县| 德阳市| 长子县| 会泽县| 临泽县| 固原市| 杭锦后旗| 阜新市| 三门县| 江津市| 屏东县| 阳城县| 土默特右旗| 金溪县| 衡阳县| 北流市| 黄山市| 兰溪市| 古交市| 视频| 鄯善县| 桂林市| 肥乡县| 前郭尔| 湄潭县| 龙口市| 宁安市| 涞源县|