書名： Learning Spark SQL
作者名： Aurobindo Sarkar
本章字?jǐn)?shù)： 151字
更新時(shí)間： 2021-07-02 18:23:43

Using Spark SQL for Processing Structured and Semistructured Data

In this chapter, we will familiarize you with using Spark SQL with different types of data sources and data storage formats. Spark provides easy and standard structures (that is, RDDs and DataFrames/Datasets) to work with both structured and semistructured data. We include some of the data sources that are most commonly used in big data applications, such as, relational data, NoSQL databases, and files (CSV, JSON, Parquet, and Avro). Spark also allows you to define and use custom data sources. A series of hands-on exercises in this chapter will enable you to use Spark with different types of data sources and data formats.

In this chapter, you shall learn the following topics:

Understanding data sources in Spark applications
Using JDBC to work with relational databases
Using Spark with MongoDB (NoSQL database)
Working with JSON data
Using Spark with Avro and Parquet Datasets

官术网_书友最值得收藏!

Learning Spark SQL

Using Spark SQL for Processing Structured and Semistructured Data