官术网_书友最值得收藏!

Passing functions to Spark (Scala)

As you have seen in the previous example, passing functions is a critical functionality provided by Spark. From a user's point of view you would pass the function in your driver program, and Spark would figure out the location of the data partitions across the cluster memory, running it in parallel. The exact syntax of passing functions differs by the programming language. Since Spark has been written in Scala, we'll discuss Scala first.

In Scala, the recommended ways to pass functions to the Spark framework are as follows:

  • Anonymous functions
  • Static singleton methods

Anonymous functions

Anonymous functions are used for short pieces of code. They are also referred to as lambda expressions, and are a cool and elegant feature of the programming language. The reason they are called anonymous functions is because you can give any name to the input argument and the result would be the same.

For example, the following code examples would produce the same output:

val words = dataFile.map(line => line.split(" ")) 
val words = dataFile.map(anyline => anyline.split(" ")) 
val words = dataFile.map(_.split(" ")) 

Figure 2.11: Passing anonymous functions to Spark in Scala

Static singleton functions

While anonymous functions are really helpful for short snippets of code, they are not very helpful when you want to request the framework for a complex data manipulation. Static singleton functions come to the rescue with their own nuances, which we will discuss in this section.

Note

In software engineering, the Singleton pattern is a design pattern that restricts instantiation of a class to one object. This is useful when exactly one object is needed to coordinate actions across the system.

Static methods belong to the class and not an instance of it. They usually take input from the parameters, perform actions on it, and return the result.

Figure 2.12: Passing static singleton functions to Spark in Scala

Static singleton is the preferred way to pass functions, as technically you can create a class and call a method in the class instance. For example:

class UtilFunctions{ 
  def split(inputParam: String): Array[String] = {inputParam.split(" ")} 
  def operate(rdd: RDD[String]): RDD[String] ={rdd.map(split)} 
} 

You can send a method in a class, but that has performance implications as the entire object would be sent along the method.

主站蜘蛛池模板: 灵丘县| 扎兰屯市| 巴青县| 年辖:市辖区| 伽师县| 永宁县| 通海县| 特克斯县| 汾西县| 龙胜| 昔阳县| 临西县| 铁岭县| 诏安县| 九龙坡区| 黑水县| 宁明县| 南平市| 永定县| 拜泉县| 清远市| 遂川县| 内黄县| 九龙县| 德庆县| 扶绥县| 黄梅县| 河津市| 兴城市| 方山县| 九江县| 南川市| 师宗县| 贺州市| 手游| 泾川县| 北票市| 宁陵县| 沁源县| 大新县| 霍邱县|