官术网_书友最值得收藏!

Creating RDDs

RDDs can be Scala Spark shells that you launched earlier:

val collection = List("a", "b", "c", "d", "e") 
val rddFromCollection = sc.parallelize(collection)

RDDs can also be created from Hadoop-based input sources, including the local filesystem, HDFS, and Amazon S3. A Hadoop-based RDD can utilize any input format that implements the Hadoop InputFormat interface, including text files, other standard Hadoop formats, HBase, Cassandra, tachyon, and many more.

The following code is an example of creating an RDD from a text file located on the local filesystem:

val rddFromTextFile = sc.textFile("LICENSE")

The preceding textFile method returns an RDD where each record is a String object that represents one line of the text file. The output of the preceding command is as follows:

rddFromTextFile: org.apache.spark.rdd.RDD[String] = LICENSE   
MapPartitionsRDD[1] at textFile at <console>:24

The following code is an example of how to create an RDD from a text file located on the HDFS using hdfs:// protocol:

val rddFromTextFileHDFS = sc.textFile("hdfs://input/LICENSE ")

The following code is an example of how to create an RDD from a text file located on the Amazon S3 using s3n:// protocol:

val rddFromTextFileS3 = sc.textFile("s3n://input/LICENSE ")
主站蜘蛛池模板: 宜兰县| 渝北区| 肇东市| 潜山县| 万荣县| 土默特左旗| 屏东县| 遵义市| 嘉鱼县| 来安县| 教育| 牟定县| 镇巴县| 商河县| 普定县| 轮台县| 潍坊市| 吉水县| 临澧县| 刚察县| 巩留县| 和政县| 高尔夫| 镇赉县| 文昌市| 巢湖市| 二连浩特市| 英超| 墨脱县| 龙岩市| 乐亭县| 宁武县| 普宁市| 泗洪县| 德江县| 南乐县| 施秉县| 台南县| 旅游| 万安县| 宁乡县|