- Apache Spark 2.x for Java Developers
- Sourav Gulati Sumit Kumar
- 180字
- 2021-07-02 19:01:52
HDFS I/O
An HDFS read operation from a client involves the following:
- The client requests NameNode to determine where the actual data blocks are stored for a given file.
- NameNode obliges by providing the block IDs and locations of the hosts (DataNode) where the data can be found.
- The client contacts DataNode with the respective block IDs to fetch the data from DataNode while preserving the order of the block files.

An HDFS write operation from a client involves the following:
- The client contacts NameNode to update the namespace with the filename and verify the necessary permissions.
- If the file exists, then NameNode throws an error; otherwise, it returns the client FSDataOutputStream which points to the data queue.
- The data queue negotiates with the NameNode to allocate new blocks on suitable DataNodes.
- The data is then copied to that DataNode, and, as per the replication strategy, the data is further copied from that DataNode to the rest of the DataNodes.
- It's important to note that the data is never moved through the NameNode as it would caused a performance bottleneck.
推薦閱讀
- iOS Game Programming Cookbook
- 大學計算機基礎實驗教程
- Microsoft Application Virtualization Cookbook
- HTML5 Mobile Development Cookbook
- C語言程序設計實訓教程
- Learning Laravel 4 Application Development
- Learning Python Design Patterns(Second Edition)
- Learning Firefox OS Application Development
- Java編程技術與項目實戰(第2版)
- Test-Driven Development with Django
- Access 2010數據庫應用技術實驗指導與習題選解(第2版)
- Raspberry Pi Robotic Blueprints
- Orchestrating Docker
- R Data Science Essentials
- 區塊鏈:技術與場景