書名： Learning Spark SQL
作者名： Aurobindo Sarkar
本章字數： 170字
更新時間： 2021-07-02 18:23:51

Executing other miscellaneous processing steps

If required we can choose to execute a few more steps to help cleanse the data further, study more aggregations, or to convert to a typesafe data structure, and so on.

We can drop the time column and aggregate the values in various columns using aggregation functions such as sum and average on the values of each day's readings. Here, we rename the columns with a d prefix to represent daily values.

We display a few sample records from this DataFrame:

scala> finalDayDf1.show(5)

Here, we group the readings by year and month, and then count the number of readings and display them for each of the months. The first month's number of readings is low as the data was captured in half a month.

We can also convert our DataFrame to a Dataset using a case class, as follows:

At this stage, we have completed all the steps for pre-processing the household electric consumption Dataset. We now shift our focus to processing the weather Dataset.

官术网_书友最值得收藏!

Learning Spark SQL

Executing other miscellaneous processing steps