官术网_书友最值得收藏!

Passing functions to Spark (Scala)

As you have seen in the previous example, passing functions is a critical functionality provided by Spark. From a user's point of view you would pass the function in your driver program, and Spark would figure out the location of the data partitions across the cluster memory, running it in parallel. The exact syntax of passing functions differs by the programming language. Since Spark has been written in Scala, we'll discuss Scala first.

In Scala, the recommended ways to pass functions to the Spark framework are as follows:

  • Anonymous functions
  • Static singleton methods

Anonymous functions

Anonymous functions are used for short pieces of code. They are also referred to as lambda expressions, and are a cool and elegant feature of the programming language. The reason they are called anonymous functions is because you can give any name to the input argument and the result would be the same.

For example, the following code examples would produce the same output:

val words = dataFile.map(line => line.split(" ")) 
val words = dataFile.map(anyline => anyline.split(" ")) 
val words = dataFile.map(_.split(" ")) 

Figure 2.11: Passing anonymous functions to Spark in Scala

Static singleton functions

While anonymous functions are really helpful for short snippets of code, they are not very helpful when you want to request the framework for a complex data manipulation. Static singleton functions come to the rescue with their own nuances, which we will discuss in this section.

Note

In software engineering, the Singleton pattern is a design pattern that restricts instantiation of a class to one object. This is useful when exactly one object is needed to coordinate actions across the system.

Static methods belong to the class and not an instance of it. They usually take input from the parameters, perform actions on it, and return the result.

Figure 2.12: Passing static singleton functions to Spark in Scala

Static singleton is the preferred way to pass functions, as technically you can create a class and call a method in the class instance. For example:

class UtilFunctions{ 
  def split(inputParam: String): Array[String] = {inputParam.split(" ")} 
  def operate(rdd: RDD[String]): RDD[String] ={rdd.map(split)} 
} 

You can send a method in a class, but that has performance implications as the entire object would be sent along the method.

主站蜘蛛池模板: 白水县| 房产| 治多县| 中西区| 东海县| 婺源县| 富阳市| 东至县| 天峻县| 库车县| 突泉县| 石嘴山市| 嘉鱼县| 吴江市| 曲沃县| 通海县| 普格县| 瑞金市| 清水河县| 杨浦区| 固安县| 汉沽区| 漯河市| 广元市| 新津县| 台南县| 湟源县| 塘沽区| 梨树县| 衡阳县| 甘谷县| 长子县| 临武县| 舞阳县| 镇平县| 赤壁市| 屏东县| 屯门区| 白银市| 出国| 腾冲县|