官术网_书友最值得收藏!

The Spark shell

Spark supports writing programs interactively using the Scala, Python, or R REPL (that is, the Read-Eval-Print-Loop, or interactive shell). The shell provides instant feedback as we enter code, as this code is immediately evaluated. In the Scala shell, the return result and type is also displayed after a piece of code is run.

To use the Spark shell with Scala, simply run ./bin/spark-shell from the Spark base directory. This will launch the Scala shell and initialize SparkContext, which is available to us as the Scala value, sc. With Spark 2.0, a SparkSession instance in the form of Spark variable is available in the console as well.

Your console output should look similar to the following:

$ ~/work/spark-2.0.0-bin-hadoop2.7/bin/spark-shell 
Using Spark's default log4j profile: org/apache/spark/log4j-
defaults.properties

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/08/06 22:14:25 WARN NativeCodeLoader: Unable to load native-
hadoop library for your platform... using builtin-java classes
where applicable

16/08/06 22:14:25 WARN Utils: Your hostname, ubuntu resolves to a
loopback address: 127.0.1.1; using 192.168.22.180 instead (on
interface eth1)

16/08/06 22:14:25 WARN Utils: Set SPARK_LOCAL_IP if you need to
bind to another address

16/08/06 22:14:26 WARN Utils: Service 'SparkUI' could not bind on
port 4040. Attempting port 4041.

16/08/06 22:14:27 WARN SparkContext: Use an existing SparkContext,
some configuration may not take effect.

Spark context Web UI available at http://192.168.22.180:4041
Spark context available as 'sc' (master = local[*], app id = local-
1470546866779).

Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / ______/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 2.0.0
/_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM,
Java 1.7.0_60)

Type in expressions to have them evaluated.
Type :help for more information.

scala>

To use the Python shell with Spark, simply run the ./bin/pyspark command. Like the Scala shell, the Python SparkContext object should be available as the Python variable, sc. Your output should be similar to this:

~/work/spark-2.0.0-bin-hadoop2.7/bin/pyspark 
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more
information.

Using Spark's default log4j profile: org/apache/spark/log4j-
defaults.properties

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/08/06 22:16:15 WARN NativeCodeLoader: Unable to load native-
hadoop library for your platform... using builtin-java classes
where applicable

16/08/06 22:16:15 WARN Utils: Your hostname, ubuntu resolves to a
loopback address: 127.0.1.1; using 192.168.22.180 instead (on
interface eth1)

16/08/06 22:16:15 WARN Utils: Set SPARK_LOCAL_IP if you need to
bind to another address

16/08/06 22:16:16 WARN Utils: Service 'SparkUI' could not bind on
port 4040. Attempting port 4041.

Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / ______/ __/ '_/
/__ / .__/_,_/_/ /_/_ version 2.0.0
/_/

Using Python version 2.7.6 (default, Jun 22 2015 17:58:13)
SparkSession available as 'spark'.
>>>

R is a language and has a runtime environment for statistical computing and graphics. It is a GNU project. R is a different implementation of S (a language developed by Bell Labs).

R provides statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, and clustering) and graphical techniques. It is considered to be highly extensible.

To use Spark using R, run the following command to open Spark-R shell:

$ ~/work/spark-2.0.0-bin-hadoop2.7/bin/sparkR
R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Launching java with spark-submit command /home/ubuntu/work/spark-
2.0.0-bin-hadoop2.7/bin/spark-submit "sparkr-shell"
/tmp/RtmppzWD8S/backend_porta6366144af4f

Using Spark's default log4j profile: org/apache/spark/log4j-
defaults.properties

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/08/06 22:26:22 WARN NativeCodeLoader: Unable to load native-
hadoop library for your platform... using builtin-java classes
where applicable

16/08/06 22:26:22 WARN Utils: Your hostname, ubuntu resolves to a
loopback address: 127.0.1.1; using 192.168.22.186 instead (on
interface eth1)

16/08/06 22:26:22 WARN Utils: Set SPARK_LOCAL_IP if you need to
bind to another address

16/08/06 22:26:22 WARN Utils: Service 'SparkUI' could not bind on
port 4040. Attempting port 4041.


Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ ____/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 2.0.0
/_/
SparkSession available as 'spark'.
During startup - Warning message:
package 'SparkR' was built under R version 3.1.1
>
主站蜘蛛池模板: 高安市| 神农架林区| 鸡东县| 双流县| 河源市| 宾川县| 定安县| 凤凰县| 武陟县| 黄石市| 广德县| 江阴市| 博客| 安化县| 瑞安市| 阿克苏市| 筠连县| 商河县| 洛扎县| 阜城县| 健康| 东阿县| 错那县| 蓬溪县| 遵化市| 分宜县| 武川县| 嘉荫县| 赫章县| 洞头县| 隆回县| 木里| 青州市| 宜春市| 曲靖市| 南靖县| 博野县| 青龙| 吉隆县| 耒阳市| 新野县|