官术网_书友最值得收藏!

Benchmarking HDFS using DFSIO

Hadoop contains several benchmarks that you can use to verify whether your HDFS cluster is set up properly and performs as expected. DFSIO is a benchmark test that comes with Hadoop, which can be used to analyze the I/O performance of an HDFS cluster. This recipe shows how to use DFSIO to benchmark the read/write performance of an HDFS cluster.

Getting ready

You must set up and deploy HDFS and Hadoop v2 YARN MapReduce prior to running these benchmarks. Locate the hadoop-mapreduce-client-jobclient-*-tests.jar file in your Hadoop installation.

How to do it...

The following steps will show you how to run the write and read DFSIO performance benchmarks:

  1. Execute the following command to run the HDFS write performance benchmark. The –nrFiles parameter specifies the number of files to be written by the benchmark. Use a number high enough to saturate the task slots in your cluster. The -fileSize parameter specifies the file size of each file in MB. Change the location of the hadoop-mapreduce-client-jobclient-*-tests.jar file in the following commands according to your Hadoop installation.
    $ hadoop jar \
    $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar \
    TestDFSIO -write -nrFiles 32 –fileSize 1000
    
  2. The write benchmark writes the results to the console as well as appending to a file named TestDFSIO_results.log. You can provide your own result filename using the –resFile parameter.
  3. The following step will show you how to run the HDFS read performance benchmark. The read performance benchmark uses the files written by the write benchmark in step 1. Hence, the write benchmark should be executed before running the read benchmark and the files written by the write benchmark should exist in the HDFS for the read benchmark to work properly. The benchmark writes the results to the console and appends the results to a logfile similarly to the write benchmark.
    $hadoop jar \
    $HADOOP_HOME/share/Hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar \
    TestDFSIO -read \
    -nrFiles 32 –fileSize 1000
    
  4. The files generated by the preceding benchmarks can be cleaned up using the following command:
    $hadoop jar \
    $HADOOP_HOME/share/Hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar \
    TestDFSIO -clean
    

How it works...

DFSIO executes a MapReduce job where the Map tasks write and read the files in parallel, while the Reduce tasks are used to collect and summarize the performance numbers. You can compare the throughput and IO rate results of this benchmark with the total number of disks and their raw speeds to verify whether you are getting the expected performance from your cluster. Please note the replication factor when verifying the write performance results. High standard deviation in these tests may hint at one or more underperforming nodes due to some reason.

There's more...

Running these tests together with monitoring systems can help you identify the bottlenecks of your Hadoop cluster much easily.

主站蜘蛛池模板: 吉木乃县| 都昌县| 葵青区| 平安县| 淄博市| 阳朔县| 台北县| 广饶县| 新郑市| 洪江市| 扶绥县| 平塘县| 拉萨市| 噶尔县| 天峨县| 嘉禾县| 内乡县| 柳江县| 泰安市| 阿图什市| 平定县| 筠连县| 浦城县| 大荔县| 潢川县| 霍城县| 图木舒克市| 沙湾县| 河间市| 蓝山县| 赞皇县| 嵊州市| 石城县| 绥棱县| 和硕县| 波密县| 中卫市| 阿克| 彭水| 上饶县| 周宁县|