書名： Learning Spark SQL
作者名： Aurobindo Sarkar
本章字數(shù)： 223字
更新時間： 2021-07-02 18:23:48

Using Spark SQL for creating pivot tables

Pivot tables create alternate views of your data and are commonly used during data exploration. In the following example, we demonstrate pivoting using Spark DataFrames:

The following example pivots on housing loan taken and computes the numbers by marital status:

In the next example, we create a DataFrame with appropriate column names for the total and average number of calls:

In the following example, we create a DataFrame with appropriate column names for the total and average duration of calls for each job category:

In the following example, we show pivoting to compute average call duration for each job category, while also specifying a subset of marital status:

The following example is the same as the preceding one, except that we split the average call duration values by the housing loan field as well in this case:

Next, we show how you can create a DataFrame of pivot table of term deposits subscribed by month, save it to disk, and read it back into a RDD:

Further, we use the RDD in the preceding step to compute quarterly totals of customers who subscribed and did not subscribe to term loans:

We will introduce a detailed analysis of other types of data, including streaming data, large-scale graphs, time-series data, and so on, later in this book.

官术网_书友最值得收藏!

Learning Spark SQL

Using Spark SQL for creating pivot tables