官术网_书友最值得收藏!

  • Learning Spark SQL
  • Aurobindo Sarkar
  • 115字
  • 2021-07-02 18:23:49

Using Spark SQL for Data Munging

In this code-intensive chapter, we will present key data munging techniques used to transform raw data to a usable format for analysis. We start with some general data munging steps that are applicable in a wide variety of scenarios. Then, we shift our focus to specific types of data including time-series data, text, and data preprocessing steps for Spark MLlib-based machine learning pipelines. We will use several Datasets to illustrate these techniques.

In this chapter, we shall learn:

  • What is data munging?
  • Explore data munging techniques
  • Combine data using joins
  • Munging on textual data
  • Munging on time-series data
  • Dealing with variable length records
  • Data preparation for machine learning pipelines
主站蜘蛛池模板: 通山县| 合水县| 河东区| 乃东县| 保山市| 广水市| 彝良县| 德保县| 洛宁县| 蕲春县| 光山县| 金湖县| 漠河县| 霍林郭勒市| 龙南县| 龙江县| 弋阳县| 年辖:市辖区| 南郑县| 株洲市| 沛县| 腾冲县| 永吉县| 军事| 县级市| 咸阳市| 灌南县| 丰顺县| 南平市| 凯里市| 教育| 芜湖市| 承德县| 闽侯县| 揭西县| 尖扎县| 满城县| 土默特左旗| 东阳市| 杭州市| 虎林市|