- Learning Apache Spark 2
- Muhammad Asif Abbasi
- 150字
- 2021-07-09 18:46:00
Summary
In this chapter, we have gone through the concept of creating an RDD, to manipulating data within the RDD. We've looked at the transformations and actions available to an RDD, and walked you through various code examples to explain the differences between transformations and actions. Finally, we moved on to the advanced topics of PairRDD, where we demonstrated the creation of a Pair RDD along with some advanced transformations on the RDD.
We are now ready to explain the ETL process and the types of external storage systems that Spark can read/write data from including external filesystems, Apache Hadoop HDFS, Apache Hive, Amazon S3, and so on. We'll also look at some of the connectors to the most popular databases and how to optimally load data from storage systems, and store it back.
However, before moving on to the next chapter, have a break as you definitely deserve it!
- Big Data Analytics with Hadoop 3
- Mastering Matplotlib 2.x
- 輕松學C語言
- Visual C# 2008開發技術實例詳解
- 離散事件系統建模與仿真
- 數據產品經理:解決方案與案例分析
- 智能工業報警系統
- 讓每張照片都成為佳作的Photoshop后期技法
- RPA:流程自動化引領數字勞動力革命
- RPA(機器人流程自動化)快速入門:基于Blue Prism
- Dreamweaver CS6精彩網頁制作與網站建設
- 網絡管理工具實用詳解
- Salesforce Advanced Administrator Certification Guide
- 云計算和大數據的應用
- INSTANT Puppet 3 Starter