官术网_书友最值得收藏!

  • Learning Spark SQL
  • Aurobindo Sarkar
  • 122字
  • 2021-07-02 18:23:50

Pre-processing of the household electric consumption Dataset

Create a case class for household electric power consumption called HouseholdEPC:

Read the input Dataset into a RDD and count the number of rows in it.

Next, remove the header and all other rows containing missing values, (represented as ?'s in the input), as shown in the following steps:

In the next step, convert the RDD [String] to a RDD with the case class, we defined earlier, and convert the RDD a DatFrame of HouseholdEPC objects.

Display a few sample records in the DataFrame, and count the number of rows in it to verify that the number of rows in the DataFrame matches the expected number of rows in your input Dataset.

主站蜘蛛池模板: 信宜市| 蕉岭县| 时尚| 清徐县| 塘沽区| 九龙城区| 行唐县| 米易县| 苏州市| 阳原县| 南郑县| 石台县| 泾源县| 司法| 栾川县| 绥芬河市| 喀喇沁旗| 皋兰县| 阳东县| 酉阳| 福清市| 吉首市| 苏尼特左旗| 环江| 津南区| 泊头市| 新建县| 竹溪县| 临海市| 酒泉市| 新晃| 庆城县| 封丘县| 黄浦区| 策勒县| 荣昌县| 鄂州市| 思南县| 闽清县| 吉木萨尔县| 军事|