官术网_书友最值得收藏!

  • Machine Learning with R
  • Brett Lantz
  • 279字
  • 2021-07-23 15:49:47

Chapter 2. Managing and Understanding Data

A key early component of any machine learning project involves managing and understanding the data you have collected. Although you may not find it as gratifying as building and deploying models—the stages in which you begin to see the fruits of your labor—you cannot ignore the preparatory work.

Any learning algorithm is only as good as its input data, and in many cases, input data is complex, messy, and spread across multiple sources and formats. Because of this complexity, the largest portion of effort invested in machine learning projects is spent on the data preparation and exploration process.

This chapter is divided into three main sections. The first section discusses the basic data structures R uses to store data. You will become very familiar with these structures as you create and manipulate datasets. The second section is practical, as it covers several functions that are useful for getting data in and out of R. In the third section, methods for understanding data are illustrated throughout the process of exploring a real-world dataset.

By the end of this chapter, you will understand:

  • The basic R data structures and how to use them to store and extract data
  • How to get data into R from a variety of source formats
  • Common methods for understanding and visualizing complex data

Since the way R thinks about data will define the way you think about data, it is helpful to understand the basic R data structures before jumping into data preparation. However, if you are already familiar with R data structures, feel free to skip ahead to the section on data preprocessing.

主站蜘蛛池模板: 陆丰市| 镇宁| 平舆县| 南澳县| 无锡市| 开鲁县| 建德市| 乳源| 开阳县| 东至县| 阿图什市| 东乌| 平利县| 德保县| 深圳市| 安新县| 来宾市| 太仆寺旗| 潼南县| 明水县| 双峰县| 自治县| 兴和县| 嵊州市| 邻水| 浑源县| 天台县| 福安市| 山东省| 湘潭县| 清远市| 博罗县| 远安县| 临泉县| 阿坝县| 齐河县| 邵阳县| 漠河县| 孙吴县| 新民市| 岑巩县|