官术网_书友最值得收藏!

Combining data using a JOIN operation

In this section, we will introduce the JOIN operation, in which the daily household electric power consumption is combined with the weather data. We have assumed the locations of readings taken for the household electric power consumption and the weather readings are in close enough proximity to be relevant.

Next, we use the join operation to combine the daily household electric power consumption Dataset with the weather Dataset.

Verify the number of rows in the final DataFrame obtained with the number of rows expected subsequent to the join operation shown as follows:

You can compute a series of correlations between various columns in the newly joined Dataset containing columns from each of the two original Datasets to get a feel for the strength and direction of relationships between the columns, as follows:

Similarly, you can join the Datasets grouped by year and month to get a higher-level summarization of the data.

In order to visualize the summarized data, we can execute the preceding statements in an Apache Zeppelin notebook. For instance, we can plot the monthly Global Reactive Power (GRP) values by transforming joinedMonthlyDF into a table and then selecting the appropriate columns from it, as follows:

Similarly, if you want to analyze readings by the day of the week then follow, the steps as shown:

Finally, we print the schema of the joined Dataset (augmented with the day of the week column) so you can further explore the relationships between various fields of this DataFrame:

In the next section, we shift our focus to munging textual data.

主站蜘蛛池模板: 青冈县| 乌鲁木齐县| 阳高县| 都江堰市| 开阳县| 江都市| 仁化县| 柘城县| 新平| 金溪县| 鲜城| 烟台市| 旬阳县| 磐安县| 玉门市| 邹城市| 长武县| 东海县| 峡江县| 莆田市| 莱西市| 新蔡县| 当雄县| 平凉市| 阿图什市| 林口县| 富民县| 师宗县| 屯昌县| 恭城| 香格里拉县| 遂宁市| 武胜县| 隆安县| 江陵县| 安平县| 和平区| 梁河县| 龙南县| 榆社县| 贡觉县|