官术网_书友最值得收藏!

Getting started with Spark

In this section, we will run Apache Spark in local mode or standalone mode. First we will set up Scala, which is the prerequisite for Apache Spark. After the Scala setup, we will set up and run Apache Spark. We will also perform some basic operations on it. So let's start.

Since Apache Spark is written in Scala, it needs Scala to be set up on the system. You can download Scala from http://www.scala-lang.org/download/ (we will set up Scala 2.11.8 in the following examples).

Once Scala is downloaded, we can set it up on a Linux system as follows:

Also, it is recommended to set the SCALA_HOME environment variable and add Scala binaries to the PATH variable. You can set it in the .bashrc file or /etc/environment file as follows:

export SCALA_HOME=/usr/local/scala-2.11.8
export PATH=$PATH:/usr/local/scala-2.11.8/bin

It is also shown in the following screenshot:

Now, we have set up a Scala environment successfully. So, it is time to download Apache Spark. You can download it from http://spark.apache.org/downloads.html.

The Spark version can be different, as per requirements.

After Apache Spark is downloaded, run the following commands to set it up:

tar -zxf spark-2.0.0-bin-hadoop2.7.tgz
sudo mv spark-2.0.0-bin-hadoop2.7 /usr/local/spark
Directory location can be different as per user's requirement.

Also, you can set environment variable SPARK_HOME. It is not mandatory; however, it helps the user to find the installation directory of Spark. Also, you can add the path of Spark binaries in the $PATH variable for accessing them without specifying their path:

export SPARK_HOME=/usr/local/spark
export PATH=$PATH:/usr/local/scala-2.11.8/bin:$SPARK_HOME/bin

It is shown in the following screenshot:

Now, we are ready to start Spark in standalone mode. Let's run the following command to start it:

$SPARK_HOME/bin/spark-shell

Also, we can simply execute the spark-shellcommand as Spark binaries are added to the environment variable PATH.


Also, you can access Spark Driver's UI at http://localhost:4040:

We will discuss more about Spark UI in the Spark Driver Web UI section of this chapter.

In this section, we have completed the Spark setup in standalone mode. In the next section, we will do some hands on Apache Spark, using spark-shell or spark-cli.

主站蜘蛛池模板: 扎囊县| 秭归县| 土默特左旗| 四会市| 乐至县| 林州市| 丰宁| 浦北县| 瑞昌市| 略阳县| 河东区| 财经| 施秉县| 元谋县| 绵阳市| 江油市| 肥东县| 花莲县| 余江县| 靖远县| 抚宁县| 海南省| 博客| 明水县| 澄江县| 宁陕县| 安新县| 龙岩市| 鹤山市| 珲春市| 响水县| 彭泽县| 尉犁县| 郧西县| 仁化县| 漯河市| 德州市| 阿合奇县| 历史| 佳木斯市| 沁水县|