官术网_书友最值得收藏!

Passing functions to Spark (Scala)

As you have seen in the previous example, passing functions is a critical functionality provided by Spark. From a user's point of view you would pass the function in your driver program, and Spark would figure out the location of the data partitions across the cluster memory, running it in parallel. The exact syntax of passing functions differs by the programming language. Since Spark has been written in Scala, we'll discuss Scala first.

In Scala, the recommended ways to pass functions to the Spark framework are as follows:

  • Anonymous functions
  • Static singleton methods

Anonymous functions

Anonymous functions are used for short pieces of code. They are also referred to as lambda expressions, and are a cool and elegant feature of the programming language. The reason they are called anonymous functions is because you can give any name to the input argument and the result would be the same.

For example, the following code examples would produce the same output:

val words = dataFile.map(line => line.split(" ")) 
val words = dataFile.map(anyline => anyline.split(" ")) 
val words = dataFile.map(_.split(" ")) 

Figure 2.11: Passing anonymous functions to Spark in Scala

Static singleton functions

While anonymous functions are really helpful for short snippets of code, they are not very helpful when you want to request the framework for a complex data manipulation. Static singleton functions come to the rescue with their own nuances, which we will discuss in this section.

Note

In software engineering, the Singleton pattern is a design pattern that restricts instantiation of a class to one object. This is useful when exactly one object is needed to coordinate actions across the system.

Static methods belong to the class and not an instance of it. They usually take input from the parameters, perform actions on it, and return the result.

Figure 2.12: Passing static singleton functions to Spark in Scala

Static singleton is the preferred way to pass functions, as technically you can create a class and call a method in the class instance. For example:

class UtilFunctions{ 
  def split(inputParam: String): Array[String] = {inputParam.split(" ")} 
  def operate(rdd: RDD[String]): RDD[String] ={rdd.map(split)} 
} 

You can send a method in a class, but that has performance implications as the entire object would be sent along the method.

主站蜘蛛池模板: 霍林郭勒市| 左贡县| 连平县| 茌平县| 成安县| 汶上县| 成都市| 南宫市| 柘城县| 霸州市| 云南省| 通渭县| 泸西县| 麻江县| 木兰县| 东方市| 甘洛县| 娄烦县| 荆州市| 广东省| 临湘市| 阜阳市| 民勤县| 保亭| 望城县| 濮阳县| 教育| 蓝田县| 兴国县| 博白县| 岳西县| 怀宁县| 郑州市| 阜宁县| 南华县| 新田县| 遵化市| 大新县| 桐庐县| 崇阳县| 江川县|