書名： Apache Spark Quick Start Guide
作者名： Shrey Mehrotra Akash Grade
本章字數： 123字
更新時間： 2021-07-02 13:40:00

Spark RDD

Resilient Distributed Datasets (RDDs) are the basic building block of a Spark application. An RDD represents a read-only collection of objects distributed across multiple machines. Spark can distribute a collection of records using an RDD and process them in parallel on different machines.

In this chapter, we shall learn about the following:

- What is an RDD?
- How do you create RDDs?
- Different operations available to work on RDDs
- Important types of RDD
- Caching an RDD
- Partitions of an RDD
- Drawbacks of using RDDs

The code examples in this chapter are written in Python and Scala only. If you wish to go through the Java and R APIs, you can visit the Spark documentation page at https://spark.apache.org/.

官术网_书友最值得收藏!

Apache Spark Quick Start Guide

Spark RDD