官术网_书友最值得收藏!

Overview of the Hadoop ecosystem

Hadoop was first released by Apache in 2011 as version 1.0.0. It only contained HDFS and MapReduce. Hadoop was designed as both a computing (MapReduce) and storage (HDFS) platform from the very beginning. With the increasing need for big data analysis, Hadoop attracts lots of other software to resolve big data questions together and merges to a Hadoop-centric big data ecosystem. The following diagram gives a brief introduction to the Hadoop ecosystem and the core software or components in the ecosystems:

Overview of the Hadoop ecosystem

The Hadoop ecosystem

In the current Hadoop ecosystem, HDFS is still the major storage option. On top of it, snappy, RCFile, Parquet, and ORCFile could be used for storage optimization. Core Hadoop MapReduce released a version 2.0 called Yarn for better performance and scalability. Spark and Tez as solutions for real-time processing are able to run on the Yarn to work with Hadoop closely. HBase is a leading NoSQL database, especially when there is a NoSQL database request on the deployed Hadoop clusters. Sqoop is still one of the leading and matured tools for exchanging data between Hadoop and relational databases. Flume is a matured distributed and reliable log-collecting tool to move or collect data to HDFS. Impala and Presto query directly against the data on HDFS for better performance. However, Hortonworks focuses on Stringer initiatives to make Hive 100 times faster. In addition, Hive over Spark and Hive over Tez offer a choice for users to run Hive on other computing frameworks rather than MapReduce. As a result, Hive is playing more important roles in the ecosystem than ever.

主站蜘蛛池模板: 肥东县| 墨脱县| 竹溪县| 吴江市| 西丰县| 罗江县| 霍林郭勒市| 沙田区| 潍坊市| 宝兴县| 荥经县| 汝城县| 周口市| 许昌县| 嘉兴市| 北辰区| 台中县| 抚州市| 巴林右旗| 朔州市| 阳泉市| 威海市| 铜陵市| 子长县| 全南县| 衡阳县| 白银市| 抚宁县| 民丰县| 繁昌县| 佛山市| 尤溪县| 华宁县| 电白县| 柞水县| 仁布县| 龙里县| 重庆市| 佳木斯市| 梅河口市| 镶黄旗|