官术网_书友最值得收藏!

How to do it...

To quickly create an RDD, run PySpark on your machine via the bash terminal, or you can run the same query in a Jupyter notebook. There are two ways to create an RDD in PySpark: you can either use the parallelize() method—a collection (list or an array of some elements) or reference a file (or files) located either locally or through an external source, as noted in subsequent recipes.

The following code snippet creates your RDD (myRDD) using the sc.parallelize() method:

myRDD = sc.parallelize([('Mike', 19), ('June', 18), ('Rachel',16), ('Rob', 18), ('Scott', 17)])

To view what is inside your RDD, you can run the following code snippet:

myRDD.take(5)

The output is as follows:

Out[10]: [('Mike', 19), ('June', 18), ('Rachel',16), ('Rob', 18), ('Scott', 17)]
主站蜘蛛池模板: 简阳市| 若羌县| 手游| 柯坪县| 五河县| 上蔡县| 汨罗市| 潢川县| 来凤县| 济阳县| 靖安县| 隆化县| 巴林左旗| 曲阜市| 怀集县| 封开县| 宁化县| 克东县| 台州市| 北票市| 清水县| 樟树市| 久治县| 林州市| 祁门县| 呼图壁县| 长兴县| 视频| 巴南区| 方山县| 湘潭市| 新竹县| 克东县| 泊头市| 汉阴县| 汾西县| 德保县| 翁源县| 海兴县| 淮安市| 金堂县|