官术网_书友最值得收藏!

Components of Apache Hadoop

Apache Hadoop is composed of two core components. They are:

  • HDFS: The HDFS is responsible for the storage of files. It is the storage component of Apache Hadoop, which was designed and developed to handle large files efficiently. It is a distributed filesystem designed to work on a cluster and makes it easy to store large files by splitting the files into blocks and distributing them across multiple nodes redundantly. The users of HDFS need not worry about the underlying networking aspects, as HDFS takes care of it. HDFS is written in Java and is a filesystem that runs within the user space.
  • MapReduce: MapReduce is a programming model that was built from models found in the field of functional programming and distributed computing. In MapReduce, the task is broken down to two parts: map and reduce. All data in MapReduce flows in the form of key and value pairs, <key, value>. Mappers emit key and value pairs and the reducers receive them, work on them, and produce the final result. This model was specifically built to query/process the large volumes of data stored in HDFS.

We will be going through HDFS and MapReduce in depth in the next chapter.

主站蜘蛛池模板: 太仆寺旗| 呼伦贝尔市| 安远县| 河间市| 高碑店市| 郑州市| 扬州市| 比如县| 慈利县| 张北县| 双桥区| 临西县| 甘谷县| 丰县| 两当县| 吉木萨尔县| 固安县| 扶余县| 吴堡县| 谢通门县| 侯马市| 卓资县| 泰宁县| 武陟县| 襄樊市| 方城县| 连南| 屏东县| 嘉峪关市| 宜黄县| 金华市| 安顺市| 新干县| 永年县| 临朐县| 林芝县| 磴口县| 霍林郭勒市| 华阴市| 紫云| 通江县|