官术网_书友最值得收藏!

Introduction

Hadoop has been the primary platform for many people who deal with big data problems. It is the heart of big data. Hadoop was developed way back between 2003 and 2004 when Google published research papers on Google File System (GFS) and Map Reduce. Hadoop was structured around the crux of these research papers, and thus derived its shape. With the advancement of the Internet and social media, people slowly started realizing the power that Hadoop had, and it soon became the top platform used to handle big data. With a lot of hard work from dedicated contributors and open source groups to the project, Hadoop 1.0 was released and the IT industry welcomed it with open arms.

A lot of companies started using Hadoop as the primary platform for their Data Warehousing and Extract-Transform-Load (ETL) needs. They started deploying thousands of nodes in a Hadoop cluster and realized that there were scalability issues beyond the 4000+ node clusters that were already present. This was because JobTracker was not able to handle that many Task Trackers, and there was also the need for high availability in order to make sure that clusters were reliable to use. This gave birth to Hadoop 2.0.

In this introductory chapter, we are going to learn interesting recipes such as installing a single/multi-node Hadoop 2.0 cluster, its benchmarking, adding new nodes to existing clusters, and so on. So, let's get started.

主站蜘蛛池模板: 沭阳县| 边坝县| 金堂县| 宁强县| 青铜峡市| 石家庄市| 武邑县| 陆良县| 格尔木市| 香港| 巴彦县| 延庆县| 定西市| 肇州县| 永宁县| 通州市| 禄丰县| 永修县| 阿合奇县| 邵东县| 准格尔旗| 竹北市| 萍乡市| 达日县| 法库县| 清远市| 和田县| 江山市| 龙口市| 洪雅县| 金秀| 申扎县| 潞城市| 清水河县| 大安市| 望城县| 革吉县| 赣榆县| 高青县| 乐都县| 大化|