官术网_书友最值得收藏!

Data munging

Raw data for problems often comes from multiple sources with different and often incompatible formats. The beauty of the Spark programming model is its ability to define data operations that process the incoming data and transform it into a regular form that can be used for further feature engineering and model building. This process is commonly referred to as data munging and is where much of the battle is won with respect to data science projects. We keep this section intentionally brief because the best way to showcase the power--and necessity!--of data munging is by example. So, take heart; we have plenty of practice to go through in this book, which emphasizes this essential process.

主站蜘蛛池模板: 临沂市| 保康县| 嘉义县| 永济市| 富顺县| 收藏| 晋宁县| 克拉玛依市| 横峰县| 浠水县| 鹤山市| 洱源县| 呼伦贝尔市| 遂宁市| 无为县| 郴州市| 肥东县| 莱阳市| 兴城市| 邓州市| 彝良县| 宜宾市| 吉水县| 东宁县| 乌兰浩特市| 通河县| 米林县| 满洲里市| 茂名市| 招远市| 湘西| 长葛市| 邛崃市| 德化县| 泰安市| 屯留县| 简阳市| 衡水市| 施秉县| 新河县| 横峰县|