官术网_书友最值得收藏!

Benchmarking HDFS using DFSIO

Hadoop contains several benchmarks that you can use to verify whether your HDFS cluster is set up properly and performs as expected. DFSIO is a benchmark test that comes with Hadoop, which can be used to analyze the I/O performance of an HDFS cluster. This recipe shows how to use DFSIO to benchmark the read/write performance of an HDFS cluster.

Getting ready

You must set up and deploy HDFS and Hadoop v2 YARN MapReduce prior to running these benchmarks. Locate the hadoop-mapreduce-client-jobclient-*-tests.jar file in your Hadoop installation.

How to do it...

The following steps will show you how to run the write and read DFSIO performance benchmarks:

  1. Execute the following command to run the HDFS write performance benchmark. The –nrFiles parameter specifies the number of files to be written by the benchmark. Use a number high enough to saturate the task slots in your cluster. The -fileSize parameter specifies the file size of each file in MB. Change the location of the hadoop-mapreduce-client-jobclient-*-tests.jar file in the following commands according to your Hadoop installation.
    $ hadoop jar \
    $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar \
    TestDFSIO -write -nrFiles 32 –fileSize 1000
    
  2. The write benchmark writes the results to the console as well as appending to a file named TestDFSIO_results.log. You can provide your own result filename using the –resFile parameter.
  3. The following step will show you how to run the HDFS read performance benchmark. The read performance benchmark uses the files written by the write benchmark in step 1. Hence, the write benchmark should be executed before running the read benchmark and the files written by the write benchmark should exist in the HDFS for the read benchmark to work properly. The benchmark writes the results to the console and appends the results to a logfile similarly to the write benchmark.
    $hadoop jar \
    $HADOOP_HOME/share/Hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar \
    TestDFSIO -read \
    -nrFiles 32 –fileSize 1000
    
  4. The files generated by the preceding benchmarks can be cleaned up using the following command:
    $hadoop jar \
    $HADOOP_HOME/share/Hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar \
    TestDFSIO -clean
    

How it works...

DFSIO executes a MapReduce job where the Map tasks write and read the files in parallel, while the Reduce tasks are used to collect and summarize the performance numbers. You can compare the throughput and IO rate results of this benchmark with the total number of disks and their raw speeds to verify whether you are getting the expected performance from your cluster. Please note the replication factor when verifying the write performance results. High standard deviation in these tests may hint at one or more underperforming nodes due to some reason.

There's more...

Running these tests together with monitoring systems can help you identify the bottlenecks of your Hadoop cluster much easily.

主站蜘蛛池模板: 深州市| 柏乡县| 西昌市| 资中县| 宣恩县| 黄浦区| 台安县| 三台县| 四平市| 开封县| 漯河市| 简阳市| 呼图壁县| 犍为县| 曲水县| 宁波市| 巨野县| 衡阳县| 名山县| 汕尾市| 中西区| 南开区| 牙克石市| 景洪市| 白水县| 西华县| 承德市| 乐至县| 尉氏县| 柏乡县| 丹巴县| 芜湖县| 玛纳斯县| 加查县| 白朗县| 松滋市| 富裕县| 文安县| 玉溪市| 肥东县| 肃宁县|