官术网_书友最值得收藏!

Running the WordCount program in a distributed cluster environment

This recipe describes how to run a MapReduce computation in a distributed Hadoop v2 cluster.

Getting ready

Start the Hadoop cluster by following the Setting up HDFS recipe or the Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution recipe.

How to do it...

Now let's run the WordCount sample in the distributed Hadoop v2 setup:

  1. Upload the wc-input directory in the source repository to the HDFS filesystem. Alternatively, you can upload any other set of text documents as well.
    $ hdfs dfs -copyFromLocal wc-input .
    
  2. Execute the WordCount example from the HADOOP_HOME directory:
    $ hadoop jar hcb-c1-samples.jar \
    chapter1.WordCount \
    wc-input wc-output
    
  3. Run the following commands to list the output directory and then look at the results:
    $hdfs dfs -ls wc-output
    Found 3 items
    -rw-r--r-- 1 joesupergroup0 2013-11-09 09:04 /data/output1/_SUCCESS
    drwxr-xr-x - joesupergroup0 2013-11-09 09:04 /data/output1/_logs
    -rw-r--r-- 1 joesupergroup1306 2013-11-09 09:04 /data/output1/part-r-00000
    
    $ hdfs dfs -cat wc-output/part*
    

How it works...

When we submit a job, YARN would schedule a MapReduce ApplicationMaster to coordinate and execute the computation. ApplicationMaster requests the necessary resources from the ResourceManager and executes the MapReduce computation using the containers it received from the resource request.

There's more...

You can also see the results of the WordCount application through the HDFS monitoring UI by visiting http://NAMANODE:50070.

主站蜘蛛池模板: 洪洞县| 托克逊县| 甘德县| 怀远县| 陇西县| 嘉定区| 蓬莱市| 新巴尔虎左旗| 永年县| 新竹县| 中超| 益阳市| 高陵县| 顺昌县| 巴林右旗| 蓬溪县| 红桥区| 望谟县| 闽清县| 社会| 宜春市| 西充县| 花莲县| 祁连县| 巴里| 龙江县| 南安市| 宽城| 当涂县| 班玛县| 慈溪市| 洞头县| 天等县| 那坡县| 繁峙县| 海盐县| 酉阳| 莲花县| 天门市| 奈曼旗| 西充县|