官术网_书友最值得收藏!

Benchmarking HDFS using DFSIO

Hadoop contains several benchmarks that you can use to verify whether your HDFS cluster is set up properly and performs as expected. DFSIO is a benchmark test that comes with Hadoop, which can be used to analyze the I/O performance of an HDFS cluster. This recipe shows how to use DFSIO to benchmark the read/write performance of an HDFS cluster.

Getting ready

You must set up and deploy HDFS and Hadoop v2 YARN MapReduce prior to running these benchmarks. Locate the hadoop-mapreduce-client-jobclient-*-tests.jar file in your Hadoop installation.

How to do it...

The following steps will show you how to run the write and read DFSIO performance benchmarks:

  1. Execute the following command to run the HDFS write performance benchmark. The –nrFiles parameter specifies the number of files to be written by the benchmark. Use a number high enough to saturate the task slots in your cluster. The -fileSize parameter specifies the file size of each file in MB. Change the location of the hadoop-mapreduce-client-jobclient-*-tests.jar file in the following commands according to your Hadoop installation.
    $ hadoop jar \
    $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar \
    TestDFSIO -write -nrFiles 32 –fileSize 1000
    
  2. The write benchmark writes the results to the console as well as appending to a file named TestDFSIO_results.log. You can provide your own result filename using the –resFile parameter.
  3. The following step will show you how to run the HDFS read performance benchmark. The read performance benchmark uses the files written by the write benchmark in step 1. Hence, the write benchmark should be executed before running the read benchmark and the files written by the write benchmark should exist in the HDFS for the read benchmark to work properly. The benchmark writes the results to the console and appends the results to a logfile similarly to the write benchmark.
    $hadoop jar \
    $HADOOP_HOME/share/Hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar \
    TestDFSIO -read \
    -nrFiles 32 –fileSize 1000
    
  4. The files generated by the preceding benchmarks can be cleaned up using the following command:
    $hadoop jar \
    $HADOOP_HOME/share/Hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar \
    TestDFSIO -clean
    

How it works...

DFSIO executes a MapReduce job where the Map tasks write and read the files in parallel, while the Reduce tasks are used to collect and summarize the performance numbers. You can compare the throughput and IO rate results of this benchmark with the total number of disks and their raw speeds to verify whether you are getting the expected performance from your cluster. Please note the replication factor when verifying the write performance results. High standard deviation in these tests may hint at one or more underperforming nodes due to some reason.

There's more...

Running these tests together with monitoring systems can help you identify the bottlenecks of your Hadoop cluster much easily.

主站蜘蛛池模板: 肥乡县| 浪卡子县| 章丘市| 广饶县| 上杭县| 赫章县| 乃东县| 陇南市| 芦山县| 田阳县| 德令哈市| 德令哈市| 长乐市| 玉山县| 临汾市| 从化市| 勃利县| 社会| 寻乌县| 新疆| 吐鲁番市| 讷河市| 康马县| 廉江市| 务川| 钟山县| 汶川县| 涞源县| 台州市| 贵德县| 时尚| 顺昌县| 普定县| 新化县| 周口市| 曲阜市| 大渡口区| 长兴县| 静安区| 大兴区| 棋牌|