- Advanced Machine Learning with R
- Cory Lesmeister Dr. Sunil Kumar Chinnamgari
- 146字
- 2021-06-24 14:24:34
Summary
This chapter looked at the common problems in large, messy datasets common in machine learning projects. These include, but are not limited to the following:
- Missing or invalid values
- Novel levels in a categorical feature that show up in algorithm production
- High cardinality in categorical features such as zip code
- High dimensionality
- Duplicate observations
This chapter provided a disciplined approach to dealing with these problems by showing how to explore the data, treat it, and create a dataframe that you can use for developing your learning algorithm. It's also flexible enough that you can modify the code to suit your circumstances. This methodology should make what many feels is the most arduous, time-consuming, and least enjoyable part of the job an easy task.
With this task behind us, we can now get started on our first modeling task using linear regression in the following chapter.
推薦閱讀
- Android NDK Game Development Cookbook
- 辦公通信設(shè)備維修
- 計(jì)算機(jī)組裝·維護(hù)與故障排除
- Intel FPGA/CPLD設(shè)計(jì)(高級(jí)篇)
- Getting Started with Qt 5
- 單片機(jī)開發(fā)與典型工程項(xiàng)目實(shí)例詳解
- LPC1100系列處理器原理及應(yīng)用
- Blender Game Engine:Beginner's Guide
- 觸摸屏應(yīng)用技術(shù)從入門到精通
- 計(jì)算機(jī)組裝、維護(hù)與維修項(xiàng)目教程
- 筆記本電腦的結(jié)構(gòu)、原理與維修
- 施耐德M241/251可編程序控制器應(yīng)用技術(shù)
- 計(jì)算機(jī)組裝與維護(hù)立體化教程(微課版)
- Exceptional C++:47個(gè)C++工程難題、編程問(wèn)題和解決方案(中文版)
- The Deep Learning Workshop