- Programming MapReduce with Scalding
- Antonios Chalkiopoulos
- 299字
- 2021-12-08 12:44:21
The Hadoop platform
Hadoop can be used for a lot of things. However, when you break it down to its core parts, the primary features of Hadoop are Hadoop Distributed File System (HDFS) and MapReduce.
HDFS stores read-only files by splitting them into large blocks and distributing and replicating them across a Hadoop cluster. Two services are involved with the filesystem. The first service, the NameNode acts as a master and keeps the directory tree of all file blocks that exist in the filesystem and tracks where the file data is kept across the cluster. The actual data of the files is stored in multiple DataNode nodes, the second service.
MapReduce is a programming model for processing large datasets with a parallel, distributed algorithm in a cluster. The most prominent trait of Hadoop is that it brings processing to the data; so, MapReduce executes tasks closest to the data as opposed to the data travelling to where the processing is performed. Two services are involved in a job execution. A job is submitted to the service JobTracker, which first discovers the location of the data. It then orchestrates the execution of the map and reduce tasks. The actual tasks are executed in multiple TaskTracker nodes.
Hadoop handles infrastructure failures such as network issues, node, or disk failures automatically. Overall, it provides a framework for distributed storage within its distributed file system and execution of jobs. Moreover, it provides the service ZooKeeper to maintain configuration and distributed synchronization.
Many projects surround Hadoop and complete the ecosystem of available Big Data processing tools such as utilities to import and export data, NoSQL databases, and event/real-time processing systems. The technologies that move Hadoop beyond batch processing focus on in-memory execution models. Overall multiple projects, from batch to hybrid and real-time execution exist.
- C語言程序設計案例教程
- GitLab Cookbook
- Java軟件開發基礎
- Hands-On Automation Testing with Java for Beginners
- Android移動開發案例教程:基于Android Studio開發環境
- Programming with CodeIgniterMVC
- 零基礎學HTML+CSS
- WebStorm Essentials
- SFML Game Development
- Swift High Performance
- Monitoring Docker
- 3D Printing Designs:The Sun Puzzle
- Java Web動態網站開發(第2版·微課版)
- Getting Started with Windows Server Security
- Java面試一戰到底(基礎卷)