官术网_书友最值得收藏!

Summary

In this chapter, we have gone through the concept of creating an RDD, to manipulating data within the RDD. We've looked at the transformations and actions available to an RDD, and walked you through various code examples to explain the differences between transformations and actions. Finally, we moved on to the advanced topics of PairRDD, where we demonstrated the creation of a Pair RDD along with some advanced transformations on the RDD.

We are now ready to explain the ETL process and the types of external storage systems that Spark can read/write data from including external filesystems, Apache Hadoop HDFS, Apache Hive, Amazon S3, and so on. We'll also look at some of the connectors to the most popular databases and how to optimally load data from storage systems, and store it back.

However, before moving on to the next chapter, have a break as you definitely deserve it!

主站蜘蛛池模板: 托里县| 宁城县| 奇台县| 黄平县| 彭州市| 黔南| 抚顺市| 新竹市| 岳阳市| 获嘉县| 伊金霍洛旗| 龙胜| 张北县| 闻喜县| 江川县| 邯郸市| 石渠县| 江山市| 彩票| 兴化市| 安宁市| 岗巴县| 南乐县| 云浮市| 肇州县| 杂多县| 扎囊县| 株洲市| 团风县| 响水县| 娄底市| 扶沟县| 普洱| 体育| 襄垣县| 新竹县| 郴州市| 虎林市| 肥东县| 合肥市| 石泉县|