- Hadoop Real-World Solutions Cookbook(Second Edition)
- Tanmay Deshpande
- 314字
- 2021-07-09 20:02:50
Changing the replication factor of an existing file in HDFS
In this recipe, we are going to take a look at how to change the replication factor of a file in HDFS. The default replication factor is 3.
Getting ready
To perform this recipe, you should already have a running Hadoop cluster.
How to do it...
Sometimes. there might be a need to increase or decrease the replication factor of a specific file in HDFS. In this case, we'll use the setrep
command.
This is how you can use the command:
hadoop fs -setrep [-R] [-w] <noOfReplicas><path> ...
In this command, a path can either be a file or directory; if its a directory, then it recursively sets the replication factor for all replicas.
- The
w
option flags the command and should wait until the replication is complete - The
r
option is accepted for backward compatibility
First, let's check the replication factor of the file we copied to HDFS in the previous recipe:
hadoop fs -ls /mydir1/LICENSE.txt -rw-r--r-- 3 ubuntu supergroup 15429 2015-10-29 03:04 /mydir1/LICENSE.txt
Once you list the file, it will show you the read/write permissions on this file, and the very next parameter is the replication factor. We have the replication factor set to 3 for our cluster, hence, you the number is 3.
Let's change it to 2
using this command:
hadoop fs -setrep -w 2 /mydir1/LICENSE.txt
It will wait till the replication is adjusted. Once done, you can verify this again by running the ls command:
hadoop fs -ls /mydir1/LICENSE.txt -rw-r--r-- 2 ubuntu supergroup 15429 2015-10-29 03:04 /mydir1/LICENSE.txt
How it works...
Once the setrep
command is executed, NameNode
will be notified, and then NameNode
decides whether the replicas need to be increased or decreased from certain DataNode
. When you are using the –w
command, sometimes, this process may take too long if the file size is too big.
- Practical Ansible 2
- Getting Started with Clickteam Fusion
- Ansible Quick Start Guide
- 空間機器人遙操作系統及控制
- 數據運營之路:掘金數據化時代
- 大型數據庫管理系統技術、應用與實例分析:SQL Server 2005
- Hybrid Cloud for Architects
- Implementing AWS:Design,Build,and Manage your Infrastructure
- Mastering Geospatial Analysis with Python
- Flink原理與實踐
- Mastering MongoDB 3.x
- Xilinx FPGA高級設計及應用
- 項目實踐精解:C#核心技術應用開發
- Getting Started with Tableau 2019.2
- 軟件質量管理實踐