官术网_书友最值得收藏!

HDFS health and FSCK

The health of the filesystem is very important for data retrieval and optimal performance. In a distributed system, it becomes more critical to maintain the good health of the HDFS filesystem so as to ensure block replication and near-parallel streaming of data blocks.

In this recipe, we will see how to check the health of the filesystem and do repairs, if any are needed.

Getting ready

Make sure you have a running cluster that has already been up for a few days with data. We can run the commands on a new cluster as well, but for the sake of this lab, it will give you more insights if it is run on a cluster with a large dataset.

How to do it...

  1. ssh to the master1.cyrus.com Namenode and change the user to hadoop.
  2. To check the HDFS root filesystem, execute the hdfs fsck / command, as shown in the following screenshot:
    How to do it...
  3. We can also check the status of just one file instead of the entire filesystem, as shown in the following screenshot:
    How to do it...
  4. The output of the fsck command will show the blocks for a file, the replication status, whether blocks are corrupted, and many more details, as shown in the following screenshot:
    How to do it...
  5. We can also look at how the blocks of a file are laid across the cluster using the commands as shown in the following screenshot:
    How to do it...
  6. In the cluster named cyrus, you can see that there are some corrupt blocks. We can simulate this by manually deleting a block of a file on the lower filesystem. Each of the HDFS blocks, is a file at the lower filesystem such as EXT4.
    How to do it...
  7. The corrupt blocks can be fixed by deleting them, and for an under replicated block we can use the hdfs dfs -setrep 2 /input/new.txt command, so that a particular file is set to the desired number of replications. If we need to set many files to a specified number of replications, just loop through the list and do a setrep on them.

How it works...

The hdfs fsck /command is similar to the Linux fsck command. In Hadoop, it does not repair the filesystem automatically and needs a manual intervention. To see what options there are for this command, please use the hdfs fsck –help help command.

See also

  • The Configuring rack awareness recipe
主站蜘蛛池模板: 凤台县| 高平市| 章丘市| 安塞县| 常宁市| 屏山县| 泰宁县| 花莲市| 望城县| 大同县| 渝北区| 会理县| 体育| 罗田县| 高雄县| 江西省| 门头沟区| 江西省| 莱州市| 彭山县| 凤庆县| 会理县| 宜州市| 富民县| 北流市| 江山市| 巩留县| 特克斯县| 馆陶县| 出国| 玉田县| 屏东市| 荥阳市| 祁门县| 海城市| 务川| 林周县| 铁岭县| 柳江县| 兴业县| 闻喜县|