- Go Machine Learning Projects
- Xuanyi Chew
- 219字
- 2021-06-10 18:46:34
Janitorial work
A large part of doing data science work is focused on cleanup. In productionized systems, this data would typically be fetched directly from the database, already relatively clean (high -quality production data science work requires a database of clean data). However, we're not in production mode yet. We're still in the model-building phase. It would be helpful to imagine writing a program solely for cleaning data.
Let's look at our requirements: starting with our data, each column is a variable—most of them are independent variables, except for the last column, which is the dependent variable. Some variables are categorical, and some are continuous. Our task is to write a function that will convert the data, currently [][]string to [][]float64.
To do that, we would require all the data to be converted into float64. For the continuous variables, it's an easy task: simply parse the string into a float. There are oddities that need to be handled, which I hope you had spotted by the time you opened the file in a spreadsheet. But the main pain is in converting categorical data to float64.
Fortunately for us, people much smarter than have figured this out decades ago. There exists an encoding scheme that allows categorical data to play nicely with linear regression algorithms.
- 課課通計算機原理
- Introduction to DevOps with Kubernetes
- Photoshop CS4經(jīng)典380例
- 圖解PLC控制系統(tǒng)梯形圖和語句表
- 人工智能工程化:應(yīng)用落地與中臺構(gòu)建
- Windows游戲程序設(shè)計基礎(chǔ)
- 網(wǎng)站前臺設(shè)計綜合實訓(xùn)
- Statistics for Data Science
- 計算機與信息技術(shù)基礎(chǔ)上機指導(dǎo)
- Hands-On Data Warehousing with Azure Data Factory
- 計算機組成與操作系統(tǒng)
- Web編程基礎(chǔ)
- 一步步寫嵌入式操作系統(tǒng)
- ZigBee無線通信技術(shù)應(yīng)用開發(fā)
- C#求職寶典