官术网_书友最值得收藏!

The Hadoop platform

Hadoop can be used for a lot of things. However, when you break it down to its core parts, the primary features of Hadoop are Hadoop Distributed File System (HDFS) and MapReduce.

HDFS stores read-only files by splitting them into large blocks and distributing and replicating them across a Hadoop cluster. Two services are involved with the filesystem. The first service, the NameNode acts as a master and keeps the directory tree of all file blocks that exist in the filesystem and tracks where the file data is kept across the cluster. The actual data of the files is stored in multiple DataNode nodes, the second service.

MapReduce is a programming model for processing large datasets with a parallel, distributed algorithm in a cluster. The most prominent trait of Hadoop is that it brings processing to the data; so, MapReduce executes tasks closest to the data as opposed to the data travelling to where the processing is performed. Two services are involved in a job execution. A job is submitted to the service JobTracker, which first discovers the location of the data. It then orchestrates the execution of the map and reduce tasks. The actual tasks are executed in multiple TaskTracker nodes.

Hadoop handles infrastructure failures such as network issues, node, or disk failures automatically. Overall, it provides a framework for distributed storage within its distributed file system and execution of jobs. Moreover, it provides the service ZooKeeper to maintain configuration and distributed synchronization.

Many projects surround Hadoop and complete the ecosystem of available Big Data processing tools such as utilities to import and export data, NoSQL databases, and event/real-time processing systems. The technologies that move Hadoop beyond batch processing focus on in-memory execution models. Overall multiple projects, from batch to hybrid and real-time execution exist.

主站蜘蛛池模板: 开鲁县| 贺兰县| 望江县| 西华县| 河津市| 水城县| 龙江县| 衡山县| 轮台县| 磐石市| 临澧县| 巴楚县| 余江县| 夏河县| 紫阳县| 江西省| 广汉市| 博乐市| 波密县| 砚山县| 高陵县| 汨罗市| 当涂县| 晋江市| 珲春市| 金乡县| 蕲春县| 九江市| 桃园市| 商河县| 屏边| 靖江市| 武安市| 祥云县| 南澳县| 临颍县| 武乡县| 阿克苏市| 临沧市| 宁国市| 湟中县|