官术网_书友最值得收藏!

Spark RDD

Resilient Distributed Datasets (RDDs) are the basic building block of a Spark application. An RDD represents a read-only collection of objects distributed across multiple machines. Spark can distribute a collection of records using an RDD and process them in parallel on different machines. 

In this chapter, we shall learn about the following:

    • What is an RDD? 
    • How do you create RDDs?
    • Different operations available to work on RDDs
    • Important types of RDD
    • Caching an RDD
    • Partitions of an RDD
    • Drawbacks of using RDDs

The code examples in this chapter are written in Python and Scala only. If you wish to go through the Java and R APIs, you can visit the Spark documentation page at https://spark.apache.org/

主站蜘蛛池模板: 阳信县| 商洛市| 包头市| 隆化县| 卓尼县| 积石山| 府谷县| 孟津县| 潼南县| 海门市| 剑河县| 阿城市| 平远县| 清涧县| 襄汾县| 东安县| 绥江县| 剑川县| 石屏县| 峡江县| 黔江区| 香港 | 姚安县| 屯留县| 高邮市| 辽中县| 永丰县| 静海县| 贵南县| 马公市| 荣昌县| 德兴市| 渭源县| 焉耆| 凤阳县| 武川县| 黄龙县| 宜兴市| 迁西县| 嫩江县| 华安县|