- Apache Spark Quick Start Guide
- Shrey Mehrotra Akash Grade
- 123字
- 2021-07-02 13:40:00
Spark RDD
Resilient Distributed Datasets (RDDs) are the basic building block of a Spark application. An RDD represents a read-only collection of objects distributed across multiple machines. Spark can distribute a collection of records using an RDD and process them in parallel on different machines.
In this chapter, we shall learn about the following:
-
- What is an RDD?
- How do you create RDDs?
- Different operations available to work on RDDs
- Important types of RDD
- Caching an RDD
- Partitions of an RDD
- Drawbacks of using RDDs
The code examples in this chapter are written in Python and Scala only. If you wish to go through the Java and R APIs, you can visit the Spark documentation page at https://spark.apache.org/.
推薦閱讀
- AWS:Security Best Practices on AWS
- Photoshop CS4經典380例
- TIBCO Spotfire:A Comprehensive Primer(Second Edition)
- SharePoint 2010開發最佳實踐
- Splunk Operational Intelligence Cookbook
- 網絡安全技術及應用
- 邊緣智能:關鍵技術與落地實踐
- 悟透AutoCAD 2009案例自學手冊
- Photoshop CS4數碼攝影處理50例
- Spark大數據商業實戰三部曲:內核解密|商業案例|性能調優
- 菜鳥起飛電腦組裝·維護與故障排查
- Mastering MongoDB 4.x
- DynamoDB Applied Design Patterns
- Win 7二十一
- Flink內核原理與實現