官术网_书友最值得收藏!

  • Learning Spark SQL
  • Aurobindo Sarkar
  • 151字
  • 2021-07-02 18:23:43

Using Spark SQL for Processing Structured and Semistructured Data

In this chapter, we will familiarize you with using Spark SQL with different types of data sources and data storage formats. Spark provides easy and standard structures (that is, RDDs and DataFrames/Datasets) to work with both structured and semistructured data. We include some of the data sources that are most commonly used in big data applications, such as, relational data, NoSQL databases, and files (CSV, JSON, Parquet, and Avro). Spark also allows you to define and use custom data sources. A series of hands-on exercises in this chapter will enable you to use Spark with different types of data sources and data formats.

In this chapter, you shall learn the following topics:

  • Understanding data sources in Spark applications
  • Using JDBC to work with relational databases
  • Using Spark with MongoDB (NoSQL database)
  • Working with JSON data
  • Using Spark with Avro and Parquet Datasets
主站蜘蛛池模板: 乌鲁木齐市| 文水县| 景谷| 连江县| 定边县| 阜城县| 宁城县| 和林格尔县| 宜宾县| 望谟县| 金坛市| 聂荣县| 长寿区| 余干县| 怀远县| 桂阳县| 塔河县| 山东| 隆安县| 紫金县| 屯昌县| 宝清县| 伊宁市| 龙州县| 普定县| 阳原县| 若羌县| 乡宁县| 新竹市| 义马市| 四子王旗| 饶河县| 宁明县| 吉木萨尔县| 潮州市| 云安县| 鸡西市| 阳朔县| 都昌县| 江永县| 沂南县|