官术网_书友最值得收藏!

Cluster mode using different managers

The following diagram shows how Spark applications run on a cluster. They are independent sets of processes that are coordinated by the SparkContext object in the Driver Program. SparkContext connects to a Cluster Manager, which is responsible for allocating resources across applications. Once the SparkContext is connected, Spark gets executors across cluster nodes.

Executors are processes that execute computations and store data for a given Spark application. SparkContext sends the application code (which could be a JAR file for Scala or .py files for Python) to the executors. Finally, it sends the tasks to run to the executors:

Figure 1.12

To isolate applications from each other, every Spark application receives its own executor processes. They stay alive for the duration of the whole application and run tasks in multithreading mode. The downside to this is that it isn't possible to share data across different Spark applications  to share it, data needs to be persisted to an external storage system.

Spark supports different cluster managers, but it is agnostic to the underlying type.

The driver program, at execution time, must be network addressable from the worker nodes because it has to listen for and accept incoming connections from its executors. Because it schedules tasks on the cluster, it should be executed close to the worker nodes, on the same local area network (if possible).

The following are the cluster managers that are currently supported in Spark:

  • Standalone: A simple cluster manager that makes it easy to set up a cluster. It is included with Spark.
  • Apache Mesos: An open source project that's used to manage computer clusters, and was developed at the University of California, Berkeley.
  • Hadoop YARN: The resource manager available in Hadoop starting from release 2.
  • Kubernetes: An open source platform for providing a container-centric infrastructure. Kubernetes support in Spark is still experimental, so it's probably not ready for production yet.
主站蜘蛛池模板: 田林县| 玛曲县| 虎林市| 柳林县| 泗水县| 普兰县| 余江县| 威远县| 临猗县| 富锦市| 西乌珠穆沁旗| 普兰县| 祁东县| 遵义市| 华阴市| 南岸区| 东乡县| 两当县| 兰考县| 磴口县| 仁怀市| 樟树市| 炉霍县| 台湾省| 甘谷县| 五常市| 泰来县| 会理县| 泗阳县| 调兵山市| 宝丰县| 浏阳市| 岑溪市| 镇赉县| 辛集市| 江达县| 萨嘎县| 乌拉特中旗| 阳山县| 盐源县| 苍南县|