官术网_书友最值得收藏!

Saving compressed data in HDFS

In this recipe, we are going to take a look at how to store and process compressed data in HDFS.

Getting ready

To perform this recipe, you should already have a running Hadoop.

How to do it...

It's always good to use compression while storing data in HDFS. HDFS supports various types of compression algorithms such as LZO, BIZ2, Snappy, GZIP, and so on. Every algorithm has its own pros and cons when you consider the time taken to compress and decompress and the space efficiency. These days people prefer Snappy compression as it aims to achieve a very high speed and a reasonable amount of compression.

We can easily store and process any number of files in HDFS. To store compressed data, we don't need to specifically make any changes to the Hadoop cluster. You can simply copy the compressed data in the same way it's in HDFS. Here is an example of this:

hadoop fs -mkdir /compressed
hadoop fs –put file.bz2 /compressed

Now, we'll run a sample program to take a look at how Hadoop automatically uncompresses the file and processes it:

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar wordcount /compressed /compressed_out

Once the job is complete, you can verify the output.

How it works...

Hadoop explores native libraries to find the support needed for various codecs and their implementations. Native libraries are specific to the platform that you run Hadoop on. You don't need to make any configuration changes to enable compression algorithms. As mentioned earlier, Hadoop supports various compression algorithms that are already familiar to the computer world. Based on your needs and requirements (more space or more time), you can choose your compression algorithm.

Take a look at http://comphadoop.weebly.com/ for more information on this.

Tip

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

  • Log in or register to our website using your e-mail address and password
  • Hover the mouse pointer on the SUPPORT tab at the top.
  • Click on Code Downloads & Errata
  • Enter the name of the book in the Search box
  • Select the book for which you're looking to download the code files
  • Choose from the drop-down menu where you purchased this book from
  • Click on Code Download

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR / 7-Zip for Windows
  • Zipeg / iZip / UnRarX for Mac
  • 7-Zip / PeaZip for Linux
主站蜘蛛池模板: 西吉县| 农安县| 蒙自县| 陆丰市| 平江县| 德阳市| 梅河口市| 桦南县| 花莲县| 德保县| 葫芦岛市| 大悟县| 海城市| 绥阳县| 来安县| 陇西县| 七台河市| 类乌齐县| 项城市| 垣曲县| 青铜峡市| 通榆县| 墨竹工卡县| 常州市| 城口县| 榆社县| 鱼台县| 峨边| 北京市| 辽中县| 鄂托克前旗| 舟山市| 兴山县| 颍上县| 承德市| 石嘴山市| 凤山市| 简阳市| 通城县| 阿尔山市| 华坪县|