官术网_书友最值得收藏!

Spark architecture overview

Spark follows a master-slave architecture, as it allows it to scale on demand. Spark's architecture has two main components:

  • Driver Program: A driver program is where a user writes Spark code using either Scala, Java, Python, or R APIs. It is responsible for launching various parallel operations of the cluster.
  • Executor: Executor is the Java Virtual Machine (JVM) that runs on a worker node of the cluster. Executor provides hardware resources for running the tasks launched by the driver program.

As soon as a Spark job is submitted, the driver program launches various operation on each executor. Driver and executors together make an application.

The following diagram demonstrates the relationships between Driver, Workers, and Executors. As the first step, a driver process parses the user code (Spark Program) and creates multiple executors on each worker node. The driver process not only forks the executors on work machines, but also sends tasks to these executors to run the entire application in parallel.

Once the computation is completed, the output is either sent to the driver program or saved on to the file system:

Driver, Workers, and Executors
主站蜘蛛池模板: 石首市| 呼和浩特市| 宜兰县| 丹阳市| 内丘县| 万安县| 麻栗坡县| 获嘉县| 出国| 南康市| 扎囊县| 丰镇市| 霍邱县| 会宁县| 通河县| 岳池县| 西安市| 新营市| 鹤壁市| 安化县| 荣昌县| 彩票| 红桥区| 河北省| 贞丰县| 佛坪县| 常德市| 大洼县| 玛曲县| 莱芜市| 抚顺市| 镇巴县| 文昌市| 南华县| 阜阳市| 扎赉特旗| 象州县| 环江| 荆门市| 陵水| 巩义市|