官术网_书友最值得收藏!

DataNode

DataNode in the Hadoop ecosystem is primarily responsible for storing application data in distributed and replicated form. It acts as a slave in the system and is controlled by NameNode. Each disk in the Hadoop system is divided into multiple blocks, just like a traditional computer storage device. A block is a minimal unit in which the data can be read or written by the Hadoop filesystem. This ecosystem gives a natural advantage in slicing large files into these blocks and storing them across multiple nodes. The default block size of data node varies from 64 MB to 128 MB, depending upon Hadoop implementation. This can be changed through the configuration of data node. HDFS is designed to support very large file sizes and for write-once-read-many-based semantics.

Data nodes are primarily responsible for storing and retrieving these blocks when they are requested by consumers through Name Node. In Hadoop version 3.X, DataNode not only stores the data in blocks, but also the checksum or parity of the original blocks in a distributed manner. DataNodes follow the replication pipeline mechanism to store data in chunks propagating portions to other data nodes.

When a cluster starts, NameNode starts in a safe mode, until the data nodes register the data block information with NameNode. Once this is validated, it starts engaging with clients for serving the requests. When a data node starts, it first connects with Name Node, reporting all of the information about its data blocks' availability. This information is registered in NameNode, and when a client requests information about a certain block, NameNode points to the respective data not from its registry. The client then interacts with DataNode directly to read/write the data block. During the cluster processing, data node communicates with name node periodically, sending a heartbeat signal. The frequency of the heartbeat can be configured through configuration files.

We have gone through different key architecture components of the Apache Hadoop framework; we will be getting a deeper understanding in each of these areas in the next chapters.

主站蜘蛛池模板: 利津县| 宣城市| 南靖县| 松潘县| 平阴县| 铁力市| 凤城市| 长沙市| 昆山市| 镇巴县| 嘉义县| 徐汇区| 会理县| 新晃| 苏尼特右旗| 郧西县| 萨嘎县| 泰顺县| 西和县| 海盐县| 平陆县| 福海县| 南部县| 怀柔区| 庆阳市| 肇州县| 新乡市| 炉霍县| 托克逊县| 南乐县| 昌吉市| 长春市| 侯马市| 河源市| 新干县| 祁东县| 兴宁市| 长顺县| 泌阳县| 肥城市| 宁河县|