官术网_书友最值得收藏!

Configuring and running Spark on Amazon Elastic Map Reduce

Launch a Hadoop cluster with Spark installed using the Amazon Elastic Map Reduce. Perform the following steps to create an EMR cluster with Spark installed:

  1. Launch an Amazon EMR Cluster.
  2. Open the Amazon EMR UI console at https://console.aws.amazon.com/elasticmapreduce/.
  3. Choose Create cluster:
  1. Choose appropriate Amazon AMI Version 3.9.0 or later as shown in the following screenshot:
  1. For the applications to be installed field, choose Spark 1.5.2 or later from the list shown on the User Interface and click on Add.
  2. Select other hardware options as necessary:
    • The Instance Type
    • The keypair to be used with SSH
    • Permissions
    • IAM roles (Default orCustom)

Refer to the following screenshot:

  1. Click on Create cluster. The cluster will start instantiating as shown in the following screenshot:
  1. Log in into the master. Once the EMR cluster is ready, you can SSH into the master:
   $ ssh -i rd_spark-user1.pem
hadoop@ec2-52-3-242-138.compute-1.amazonaws.com
The output will be similar to following listing:
     Last login: Wed Jan 13 10:46:26 2016

__| __|_ )
_| ( / Amazon Linux AMI
___|___|___|

https://aws.amazon.com/amazon-linux-ami/2015.09-release-notes/
23 package(s) needed for security, out of 49 available
Run "sudo yum update" to apply all updates.
[hadoop@ip-172-31-2-31 ~]$
  1. Start the Spark Shell:
      [hadoop@ip-172-31-2-31 ~]$ spark-shell
16/01/13 10:49:36 INFO SecurityManager: Changing view acls to:
hadoop

16/01/13 10:49:36 INFO SecurityManager: Changing modify acls to:
hadoop

16/01/13 10:49:36 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(hadoop); users with modify permissions:
Set(hadoop)

16/01/13 10:49:36 INFO HttpServer: Starting HTTP Server
16/01/13 10:49:36 INFO Utils: Successfully started service 'HTTP
class server' on port 60523.

Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 1.5.2
/_/
scala> sc
  1. Run Basic Spark sample from the EMR:
    scala> val textFile = sc.textFile("s3://elasticmapreduce/samples
/hive-ads/tables/impressions/dt=2009-04-13-08-05
/ec2-0-51-75-39.amazon.com-2009-04-13-08-05.log")

scala> val linesWithCartoonNetwork = textFile.filter(line =>
line.contains("cartoonnetwork.com")).count()
Your output will be as follows:
     linesWithCartoonNetwork: Long = 9
主站蜘蛛池模板: 阿巴嘎旗| 邢台市| 宣威市| 聊城市| 华容县| 安乡县| 南城县| 高要市| 龙游县| 耒阳市| 东平县| 松江区| 阳泉市| 广南县| 阜南县| 闵行区| 谢通门县| 嵩明县| 马山县| 凌海市| 蒙自县| 班戈县| 阳东县| 广元市| 平定县| 江门市| 孝义市| 南丰县| 临清市| 彰化市| 大理市| 甘孜县| 河津市| 湘潭市| 益阳市| 个旧市| 聂拉木县| 敖汉旗| 许昌县| 奎屯市| 望江县|