官术网_书友最值得收藏!

Setting the HDFS block size for a specific file in a cluster

In this recipe, we are going to take a look at how to set the block size for a specific file only.

Getting ready

To perform this recipe, you should already have a running Hadoop cluster.

How to do it...

In the previous recipe, we learned how to change the block size at the cluster level. But this is not always required. HDFS provides us with the facility to set the block size for a single file as well. The following command copies a file called myfile to HDFS, setting the block size to 1MB:

hadoop fs -Ddfs.block.size=1048576 -put /home/ubuntu/myfile /

Once the file is copied, you can verify whether the block size is set to 1MB and has been broken into exact chunks:

hdfs fsck -blocks /myfile
 Connecting to namenode via http://localhost:50070/fsck?ugi=ubuntu&blocks=1&path=%2Fmyfile
 FSCK started by ubuntu (auth:SIMPLE) from /127.0.0.1 for path /myfile at Thu Oct 29 14:58:00 UTC 2015
 .Status: HEALTHY
 Total size: 17276808 B
 Total dirs: 0
 Total files: 1
 Total symlinks: 0
 Total blocks (validated): 17 (avg. block size 1016282 B)
 Minimally replicated blocks: 17 (100.0 %)
 Over-replicated blocks: 0 (0.0 %)
 Under-replicated blocks: 0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor: 1
 Average block replication: 1.0
 Corrupt blocks: 0
 Missing replicas: 0 (0.0 %)
 Number of data-nodes: 3
 Number of racks: 1
 FSCK ended at Thu Oct 29 14:58:00 UTC 2015 in 2 milliseconds

 The filesystem under path '/myfile' is HEALTHY

How it works...

When we specify the block size at the time of copying a file, it overwrites the default block size and copies the file to HDFS by breaking the file into chunks of a given size. Generally, these modifications are made in order to perform other optimizations. Make sure you make these changes, and you are aware of their consequences. If the block size isn't adequate enough, it will increase the parallelization, but it will also increase the load on NameNode as it would have more entries in FSImage. On the other hand, if the block size is too big, then it will reduce the parallelization and degrade the processing performance.

主站蜘蛛池模板: 额敏县| 瓮安县| 静乐县| 普陀区| 客服| 巴南区| 上高县| 临猗县| 泽普县| 岳阳县| 定西市| 垦利县| 顺平县| 张家界市| 彰武县| 宕昌县| 崇礼县| 铜鼓县| 庆阳市| 唐山市| 长沙市| 施甸县| 甘肃省| 洪湖市| 钟山县| 上林县| 荔浦县| 溆浦县| 苏尼特左旗| 余干县| 会昌县| 宜阳县| 新巴尔虎右旗| 健康| 灵武市| 博罗县| 江西省| 武平县| 伊吾县| 永康市| 炉霍县|