官术网_书友最值得收藏!

Enabling transparent encryption for HDFS

When handling sensitive data, it is always important to consider the security measures. Hadoop allows us to encrypt sensitive data that's present in HDFS. In this recipe, we are going to see how to encrypt data in HDFS.

Getting ready

To perform this recipe, you should already have a running Hadoop cluster.

How to do it...

For many applications that hold sensitive data, it is very important to adhere to standards such as PCI, HIPPA, FISMA, and so on. To enable this, HDFS provides a utility called encryption zone in which we can create a directory so that data is encrypted on writes and decrypted on read.

To use this encryption facility, we first need to enable Hadoop Key Management Server (KMS):

/usr/local/hadoop/sbin/kms.sh start

This would start KMS in the Tomcat web server.

Next, we need to append the following properties in core-site.xml and hdfs-site.xml.

In core-site.xml, add the following property:

<property>
    <name>hadoop.security.key.provider.path</name>
    <value>kms://http@localhost:16000/kms</value>
</property>

In hds-site.xml, add the following property:

<property>
    <name>dfs.encryption.key.provider.uri</name>
    <value>kms://http@localhost:16000/kms</value>
</property>

Restart the HDFS daemons:

/usr/local/hadoop/sbin/stop-dfs.sh
/usr/local/hadoop/sbin/start-dfs.sh

Now, we are all set to use KMS. Next, we need to create a key that will be used for the encryption:

hadoop key create mykey

This will create a key, and then, save it on KMS. Next, we have to create an encryption zone, which is a directory in HDFS where all the encrypted data is saved:

hadoop fs -mkdir /zone
hdfs crypto -createZone -keyName mykey -path /zone

We will change the ownership to the current user:

hadoop fs -chown ubuntu:ubuntu /zone

If we put any file into this directory, it will encrypt and would decrypt at the time of reading:

hadoop fs -put myfile /zone
hadoop fs -cat /zone/myfile

How it works...

There can be various types of encryptions one can do in order to comply with security standards, for example, application-level encryption, database-level, file-level, and disk-level encryption.

The HDFS transparent encryption sits between the database and file-level encryptions. KMS acts like a proxy between HDFS clients and HDFS's encryption provider via HTTP REST APIs. There are two types of keys used for encryption: Encryption Zone Key( EZK) and Data Encryption Key (DEK). EZK is used to encrypt DEK, which is also called Encrypted Data Encryption Key(EDEK). This is then saved on NameNode.

When a file needs to be written to the HDFS encryption zone, the client gets EDEK from NameNode and EZK from KMS to form DEK, which is used to encrypt data and store it in HDFS (the encrypted zone).

When an encrypted file needs to be read, the client needs DEK, which is formed by combining EZK and EDEK. These are obtained from KMS and NameNode, respectively. Thus, encryption and decryption is automatically handled by HDFS. and the end user does not need to worry about executing this on their own.

Note

You can read more on this topic at http://blog.cloudera.com/blog/2015/01/new-in-cdh-5-3-transparent-encryption-in-hdfs/.

主站蜘蛛池模板: 雷山县| 蒙阴县| 张掖市| 同德县| 元谋县| 明溪县| 望都县| 渝北区| 大冶市| 梨树县| 双流县| 河津市| 舞阳县| 裕民县| 赣州市| 五家渠市| 化州市| 德江县| 藁城市| 福海县| 金昌市| 志丹县| 郴州市| 化德县| 酒泉市| 沐川县| 修武县| 易门县| 开江县| 高安市| 仲巴县| 泸溪县| 深州市| 郎溪县| 马山县| 宽城| 呼伦贝尔市| 永宁县| 石棉县| 离岛区| 河北区|