- Learning Spark SQL
- Aurobindo Sarkar
- 115字
- 2021-07-02 18:23:49
Using Spark SQL for Data Munging
In this code-intensive chapter, we will present key data munging techniques used to transform raw data to a usable format for analysis. We start with some general data munging steps that are applicable in a wide variety of scenarios. Then, we shift our focus to specific types of data including time-series data, text, and data preprocessing steps for Spark MLlib-based machine learning pipelines. We will use several Datasets to illustrate these techniques.
In this chapter, we shall learn:
- What is data munging?
- Explore data munging techniques
- Combine data using joins
- Munging on textual data
- Munging on time-series data
- Dealing with variable length records
- Data preparation for machine learning pipelines
推薦閱讀
- 深入核心的敏捷開發(fā):ThoughtWorks五大關鍵實踐
- 復雜軟件設計之道:領域驅(qū)動設計全面解析與實戰(zhàn)
- R語言經(jīng)典實例(原書第2版)
- MySQL數(shù)據(jù)庫應用與管理 第2版
- SQL Server 2016從入門到精通(視頻教學超值版)
- Magento 2 Theme Design(Second Edition)
- Instant Zepto.js
- 三維圖形化C++趣味編程
- 云原生Spring實戰(zhàn)
- Implementing Cisco Networking Solutions
- C語言從入門到精通(第4版)
- Spring Boot企業(yè)級項目開發(fā)實戰(zhàn)
- Access 2010數(shù)據(jù)庫應用技術(第2版)
- SQL Server數(shù)據(jù)庫管理與開發(fā)兵書
- OpenResty完全開發(fā)指南:構建百萬級別并發(fā)的Web應用