官术网_书友最值得收藏!

  • Advanced Machine Learning with R
  • Cory Lesmeister Dr. Sunil Kumar Chinnamgari
  • 146字
  • 2021-06-24 14:24:34

Summary

This chapter looked at the common problems in large, messy datasets common in machine learning projects. These include, but are not limited to the following:

  • Missing or invalid values
  • Novel levels in a categorical feature that show up in algorithm production
  • High cardinality in categorical features such as zip code
  • High dimensionality
  • Duplicate observations

This chapter provided a disciplined approach to dealing with these problems by showing how to explore the data, treat it, and create a dataframe that you can use for developing your learning algorithm. It's also flexible enough that you can modify the code to suit your circumstances. This methodology should make what many feels is the most arduous, time-consuming, and least enjoyable part of the job an easy task.

With this task behind us, we can now get started on our first modeling task using linear regression in the following chapter.

主站蜘蛛池模板: 景谷| 喀喇沁旗| 眉山市| 太湖县| 宜丰县| 洪洞县| 稷山县| 辽阳市| 宜兰县| 两当县| 双牌县| 济阳县| 克拉玛依市| 巴中市| 望江县| 琼海市| 常州市| 丰镇市| 金溪县| 广汉市| 广州市| 安平县| 南城县| 新邵县| 冕宁县| 鄯善县| 上虞市| 洪泽县| 建昌县| 西峡县| 嵩明县| 信丰县| 万宁市| 翁源县| 沅江市| 漳平市| 扎兰屯市| 略阳县| 唐山市| 分宜县| 马边|