- Apache Spark 2.x for Java Developers
- Sourav Gulati Sumit Kumar
- 179字
- 2021-07-02 19:02:00
Creating and filtering RDD
Let's start by creating an RDD of strings:
scala>val stringRdd=sc.parallelize(Array("Java","Scala","Python","Ruby","JavaScript","Java"))
stringRdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:24
Now, we will filter this RDD to keep only those strings that start with the letter J:
scala>valfilteredRdd = stringRdd.filter(s =>s.startsWith("J"))
filteredRdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at <console>:26
In the first chapter, we learnt that if an operation on RDD returns an RDD then it is a transformation, or else it is an action.
The output of the preceding command clearly shows that filter the operation returned an RDD so the filter is a transformation.
Now, we will run an action on filteredRdd to see it's elements. Let's run collect on the filteredRdd:
scala>val list = filteredRdd.collect
list: Array[String] = Array(Java, JavaScript, Java)
As per the output of the previous command, the collect operation returned an array of strings. So, it is an action.
Now, let's see the elements of the list variable:
scala> list
res5: Array[String] = Array(Java, JavaScript, Java)
We are left with only elements that start with J, which was our desired outcome:
推薦閱讀
- Learning Java Functional Programming
- Oracle 11g從入門到精通(第2版) (軟件開發視頻大講堂)
- Mastering JavaScript Object-Oriented Programming
- 名師講壇:Java微服務架構實戰(SpringBoot+SpringCloud+Docker+RabbitMQ)
- Python Data Analysis Cookbook
- C#實踐教程(第2版)
- Visual Basic程序設計上機實驗教程
- 用戶體驗可視化指南
- Frank Kane's Taming Big Data with Apache Spark and Python
- Citrix XenServer企業運維實戰
- Nagios Core Administration Cookbook(Second Edition)
- C陷阱與缺陷
- Mastering OAuth 2.0
- AngularJS UI Development
- SQL Server 2008實用教程(第3版)