- Mastering Machine Learning with Spark 2.x
- Alex Tellez Max Pumperla Michal Malohlava
- 119字
- 2021-07-02 18:46:05
Data munging
Raw data for problems often comes from multiple sources with different and often incompatible formats. The beauty of the Spark programming model is its ability to define data operations that process the incoming data and transform it into a regular form that can be used for further feature engineering and model building. This process is commonly referred to as data munging and is where much of the battle is won with respect to data science projects. We keep this section intentionally brief because the best way to showcase the power--and necessity!--of data munging is by example. So, take heart; we have plenty of practice to go through in this book, which emphasizes this essential process.
推薦閱讀
- Learn ECMAScript(Second Edition)
- UI設計基礎培訓教程
- Microsoft Exchange Server PowerShell Cookbook(Third Edition)
- 零基礎學C++程序設計
- Android項目開發入門教程
- Learning PostgreSQL
- Machine Learning with R Cookbook(Second Edition)
- Practical DevOps
- Flux Architecture
- 微信小程序入門指南
- Oracle Exadata專家手冊
- 微服務從小白到專家:Spring Cloud和Kubernetes實戰
- Mastering Elixir
- UI設計基礎培訓教程(全彩版)
- Mastering Machine Learning with R