官术网_书友最值得收藏!

Running Hadoop in standalone mode

Now that you have successfully unzipped Hadoop, let's try and run a Hadoop program in standalone mode. As we mentioned in the introduction, Hadoop's standalone mode does not require any runtime; you can directly run your MapReduce program by running your compiled jar. We will look at how you can write MapReduce programs in the Chapter 4, Developing MapReduce Applications. For now, it's time to run a program we have already prepared. To download, compile, and run the sample program, simply take the following steps:

Please note that this is not a mandatory requirement for setting up Apache Hadoop. You do not need a Maven or Git repository setup to compile or run Hadoop. We are doing this to run some simple examples.
  1. You will need Maven and Git on your machine to proceed. Apache Maven can be set up with the following command:
     hadoop@base0:/$ sudo apt-get install maven
  1. This will install Maven on your local machine. Try running the mvn command to see if it has been installed properly. Now, install Git on your local machine with the following command:
      hadoop@base0:/$ sudo apt-get install git
  1. Now, create a folder in your home directory (such as src/) to keep all examples, and then run the following command to clone the Git repository locally:
      hadoop@base0:/$  git clone https://github.com/PacktPublishing/
Apache-Hadoop-3-Quick-Start-Guide/ src/
  1. The preceding command will create a copy of your repository locally. Now go to folder 2/ for the relevant examples for Chapter 2, Planning and Setting Up Hadoop Clusters.
  1. Now run the following mvn command from the 2/ folder. This will start downloading artifacts from the internet that have a dependency to build an example project, as shown in the next screenshot:
      hadoop@base0:/$  mvn
 
  1. Finally, you will get a build successful message. This means the jar, including your example, has been created and is ready to go. The next step is to use this jar to run the sample program which, in this case, provides a utility that allow users to supply a regular expression. The MapReduce program will then search across the given folder and bring up the matched content and its count.
  2. Let's now create an input folder and copy some documents into it. We will use a simple expression to get all the words that are separated by at least one white space. In that case, the expression will be \\s+. (Please refer to the standard Java documentation for information on how to create regular Java expressions for string patterns here.)
  3. Create a folder in which you can put sample text files for expression matching. Similarly, create an output folder to save output. To run the program, run the following command:
      hadoop@base0:/$  <hadoop-home>/bin/hadoop jar 
<location-of generated-jar> ExpressionFinder “\\s+” <folder-
containing-files-for input> <new-output-folder> > stdout.txt

In most cases, the location of the jar will be in the target folder inside the project's home. The command will create a MapReduce job, run the program, and then produce the output in the given output folder. A successful run should end with no errors, as shown in the following screenshot:

Similarly, the output folder will contain the files part-r-00000 and _SUCCESS. The file part-r-00000 should contain the output of your expression run on multiple files. You can play with other regular expressions if you wish. Here, we have simply run a regular expression program that can run over masses of files in a completely distributed manner. We will move on to look at the programming aspects of MapReduce in the Chapter 4Developing MapReduce Applications.

主站蜘蛛池模板: 中西区| 定兴县| 深泽县| 鄂托克旗| 霍山县| 开封市| 海南省| 秦安县| 达州市| 洛宁县| 武穴市| 岳阳县| 凤阳县| 朝阳市| 泉州市| 高青县| 陆河县| 饶平县| 屏东市| 涿鹿县| 垫江县| 浑源县| 桐梓县| 潮安县| 库伦旗| 龙门县| 喀什市| 晋州市| 浙江省| 七台河市| 沭阳县| 延边| 大理市| 新乐市| 陇南市| 格尔木市| 五常市| 曲麻莱县| 项城市| 新乡县| 新河县|