官术网_书友最值得收藏!

Standalone mode

For standalone mode, you only need to place a compiled version of Spark on each node of the cluster. All the cluster nodes need to be able to resolve the hostnames of the other cluster members and are routable to one another. The Spark master URL can be configured in the $SPARK_HOME/conf/spark-defaults.conf file on all of the nodes:

spark.master                     spark://<master_hostname_or_IP>:7077

Then, the hostname or IP address of the Spark master node needs to be specified in the $SPARK_HOME/conf/spark-env.sh file on all of the nodes, as follows:

SPARK_MASTER_HOST,               <master_hostname_or_IP>

It is now possible to start a standalone master server by executing the following script:

$SPARK_HOME/sbin/start-master.sh

Once the master has completed, a web UI will be available at the http://<master_hostname_or_IP>:8080 URL. From there, it is possible to obtain the master URL that's to be used when starting the workers. One or more workers can now be started by executing the following script:

$SPARK_HOME/sbin/start-slave.sh <master-spark-URL>

Each worker, after the start, comes with its own web UI, whose URL is http://<worker_hostname_or_IP>:8081.

The list of workers, along with other information about their number of CPUs and memory, can be found in the master's web UI.

The way to do this is to run a standalone cluster manually. It is also possible to use the provided launch scripts. A $SPARK_HOME/conf/slaves file needs to be created as a preliminary step. It must contain the hostnames  one per line  of all of the machines where the Spark workers should start. Passwordless SSH (short for Secure Shell) for the Spark master to the Spark slaves needs to be enabled to allow remote login for the slave daemon startup and shutdown actions. A cluster can then be launched or stopped using the following shell scripts, which are available in the $SPARK_HOME/sbin directory:

  • start-master.sh: Starts a master instance
  • start-slaves.sh: Starts a slave instance on each machine specified in the conf/slaves file
  • start-slave.sh: Starts a single slave instance
  • start-all.sh: Starts both a master and a number of slaves
  • stop-master.sh: Stops a master that has been started via the sbin/start-master.sh script
  • stop-slaves.sh: Stops all slave instances on the nodes specified in the conf/slaves file
  • stop-all.sh: Stops both a master and its slaves

These scripts must be executed on the machine the Spark master will run on.

It is possible to run an interactive Spark shell against a cluster in the following way:

$SPARK_HOME/bin/spark-shell --master <master-spark-URL>

The $SPARK_HOME/bin/spark-submit script can be used to submit a compiled Spark application to the cluster. Spark currently supports two deploy modes for standalone clusters: client and cluster. In client mode, the driver and the client that submits the application are launched in the same process, while in cluster mode, the driver is launched from one of the worker processes and the client process exits as soon as it completes submitting the application (it doesn't have to wait for the application to finish).
When an application is launched through spark-submit, then its JAR file is automatically distributed to all the worker nodes. Any additional JAR that an application depends on should be specified through the jars flag using a comma as a delimiter (for example, jars, jar1, jar2).

As mentioned in the Apache Spark fundamentals section, in standalone mode, the Spark master is a single point of failure. This means that if the Spark master node should go down, the Spark cluster would stop functioning and all currently submitted or running applications would fail, and it wouldn't be possible to submit new applications.

High availability can be configured using Apache ZooKeeper (https://zookeeper.apache.org/), an open source and highly reliable distributed coordination service, or can be deployed as a cluster through Mesos or YARN, which we will talk about in the following two sections.

主站蜘蛛池模板: 公主岭市| 奇台县| 屏东市| 昂仁县| 峨眉山市| 凤山市| 呈贡县| 濮阳市| 四会市| 金溪县| 丹凤县| 沁水县| 江永县| 鄂托克前旗| 苏尼特右旗| 庆城县| 隆尧县| 漳浦县| 特克斯县| 安阳市| 黔江区| 北辰区| 饶阳县| 东方市| 弥勒县| 阜南县| 陕西省| 澄江县| 当涂县| 衡阳县| 汨罗市| 枞阳县| 乌拉特中旗| 黑龙江省| 舞钢市| 都兰县| 安龙县| 河源市| 鄂尔多斯市| 岐山县| 神木县|