- Learning Spark SQL
- Aurobindo Sarkar
- 122字
- 2021-07-02 18:23:50
Pre-processing of the household electric consumption Dataset
Create a case class for household electric power consumption called HouseholdEPC:

Read the input Dataset into a RDD and count the number of rows in it.


Next, remove the header and all other rows containing missing values, (represented as ?'s in the input), as shown in the following steps:


In the next step, convert the RDD [String] to a RDD with the case class, we defined earlier, and convert the RDD a DatFrame of HouseholdEPC objects.

Display a few sample records in the DataFrame, and count the number of rows in it to verify that the number of rows in the DataFrame matches the expected number of rows in your input Dataset.

推薦閱讀
- Java Web開發學習手冊
- ASP.NET MVC4框架揭秘
- Building Modern Web Applications Using Angular
- Java程序設計與計算思維
- Modern JavaScript Applications
- Learning Data Mining with R
- Java程序設計入門
- Hacking Android
- WebStorm Essentials
- Learning Image Processing with OpenCV
- Mastering Node.js
- Java程序性能優化實戰
- Web前端開發全程實戰:HTML5+CSS3+JavaScript+jQuery+Bootstrap
- Unity3D游戲開發標準教程
- Scratch少兒編程思維訓練:培養孩子的邏輯思維和計算思維能力