官术网_书友最值得收藏!

  • Learning Spark SQL
  • Aurobindo Sarkar
  • 170字
  • 2021-07-02 18:23:51

Executing other miscellaneous processing steps

If required we can choose to execute a few more steps to help cleanse the data further, study more aggregations, or to convert to a typesafe data structure, and so on.

We can drop the time column and aggregate the values in various columns using aggregation functions such as sum and average on the values of each day's readings. Here, we rename the columns with a d prefix to represent daily values.

We display a few sample records from this DataFrame:

scala> finalDayDf1.show(5)

Here, we group the readings by year and month, and then count the number of readings and display them for each of the months. The first month's number of readings is low as the data was captured in half a month.

We can also convert our DataFrame to a Dataset using a case class, as follows:

At this stage, we have completed all the steps for pre-processing the household electric consumption Dataset. We now shift our focus to processing the weather Dataset.

主站蜘蛛池模板: 紫阳县| 大余县| 靖州| 宿松县| 邓州市| 怀宁县| 大同市| 天全县| 洛宁县| 青阳县| 泊头市| 英吉沙县| 临沧市| 和政县| 平顶山市| 卢龙县| 双桥区| 龙口市| 宝应县| 铁岭县| 高密市| 永兴县| 隆昌县| 汉川市| 镇赉县| 临武县| 永宁县| 原平市| 周宁县| 潢川县| 洛扎县| 大化| 宁蒗| 东乡县| 金堂县| 元朗区| 吉木乃县| 祁连县| 阳东县| 咸丰县| 岱山县|