- Machine Learning with Spark(Second Edition)
- Rajdeep Dua Manpreet Singh Ghotra Nick Pentreath
- 190字
- 2021-07-09 21:07:41
Creating RDDs
RDDs can be Scala Spark shells that you launched earlier:
val collection = List("a", "b", "c", "d", "e")
val rddFromCollection = sc.parallelize(collection)
RDDs can also be created from Hadoop-based input sources, including the local filesystem, HDFS, and Amazon S3. A Hadoop-based RDD can utilize any input format that implements the Hadoop InputFormat interface, including text files, other standard Hadoop formats, HBase, Cassandra, tachyon, and many more.
The following code is an example of creating an RDD from a text file located on the local filesystem:
val rddFromTextFile = sc.textFile("LICENSE")
The preceding textFile method returns an RDD where each record is a String object that represents one line of the text file. The output of the preceding command is as follows:
rddFromTextFile: org.apache.spark.rdd.RDD[String] = LICENSE
MapPartitionsRDD[1] at textFile at <console>:24
The following code is an example of how to create an RDD from a text file located on the HDFS using hdfs:// protocol:
val rddFromTextFileHDFS = sc.textFile("hdfs://input/LICENSE ")
The following code is an example of how to create an RDD from a text file located on the Amazon S3 using s3n:// protocol:
val rddFromTextFileS3 = sc.textFile("s3n://input/LICENSE ")
- Photoshop CS4經典380例
- TIBCO Spotfire:A Comprehensive Primer(Second Edition)
- Visual C# 2008開發技術實例詳解
- 機器人智能運動規劃技術
- 快學Flash動畫百例
- 工業機器人入門實用教程(KUKA機器人)
- 現代機械運動控制技術
- VB語言程序設計
- 統計策略搜索強化學習方法及應用
- Spark大數據技術與應用
- INSTANT Autodesk Revit 2013 Customization with .NET How-to
- Implementing AWS:Design,Build,and Manage your Infrastructure
- MCGS嵌入版組態軟件應用教程
- 設計模式
- Windows 7故障與技巧200例