官术网_书友最值得收藏!

  • Learning Spark SQL
  • Aurobindo Sarkar
  • 223字
  • 2021-07-02 18:23:48

Using Spark SQL for creating pivot tables

Pivot tables create alternate views of your data and are commonly used during data exploration. In the following example, we demonstrate pivoting using Spark DataFrames:

The following example pivots on housing loan taken and computes the numbers by marital status:

In the next example, we create a DataFrame with appropriate column names for the total and average number of calls:

In the following example, we create a DataFrame with appropriate column names for the total and average duration of calls for each job category:

In the following example, we show pivoting to compute average call duration for each job category, while also specifying a subset of marital status:

The following example is the same as the preceding one, except that we split the average call duration values by the housing loan field as well in this case:

Next, we show how you can create a DataFrame of pivot table of term deposits subscribed by month, save it to disk, and read it back into a RDD:

Further, we use the RDD in the preceding step to compute quarterly totals of customers who subscribed and did not subscribe to term loans:

We will introduce a detailed analysis of other types of data, including streaming data, large-scale graphs, time-series data, and so on, later in this book.

主站蜘蛛池模板: 韩城市| 竹北市| 龙州县| 筠连县| 登封市| 雷州市| 大洼县| 潮州市| 南乐县| 延边| 定远县| 宜兴市| 南投市| 敦化市| 英德市| 白沙| 天台县| 玉溪市| 五寨县| 聊城市| 闽清县| 茂名市| 莒南县| 同心县| 滦平县| 奇台县| 高雄市| 南皮县| 潼关县| 桓台县| 新化县| 广饶县| 临猗县| 庆阳市| 西安市| 宁晋县| 庆城县| 古丈县| 龙井市| 渭南市| 高淳县|