- Big Data Analytics with Hadoop 3
- Sridhar Alla
- 289字
- 2021-06-25 21:26:15
Partitioner
The partitioner takes the intermediate key/value pairs from the mapper (or combiner if it is being used) and splits them up into shards, one shard per reducer.
Each map function output is allocated to a particular reducer by the application's partition function for sharding purposes. The partition function, is given the key and the number of reducers and returns the index of the desired reducer.
A typical default is to hash the key and use the hash value to module the number of reducers:
partitionId = hash(key) % R, where R is number of Reducers
It is important to pick a partition function that gives an approximately uniform distribution of data per shard for load-balancing purposes; otherwise the MapReduce operation can be held up waiting for slow reducers to finish (that is, the reducers assigned the larger shares of the skewed data).
Between the map and reduce stages, the data is shuffled (parallel sorted and then exchanged between nodes) in order to move the data from the map node that produced them to the shard in which they will be reduced. The shuffle can sometimes take longer than the computation time, depending on network bandwidth, CPU speeds, data produced, and time taken by map and reduce computations.
By default, the partitioner computes the hash code of each object, which is typically an md5 checksum. Then, it randomly distributes the keyspace evenly over the reducers, but still ensures that keys with the same values in different mappers end up at the same reducer. The default behavior of the partitioner can be customized with operations such as sorting. The partitioned data is written to the local filesystem for each map task and waits to be pulled by its corresponding reducer.
- 構建高質量的C#代碼
- Design for the Future
- Verilog HDL數字系統設計入門與應用實例
- MCSA Windows Server 2016 Certification Guide:Exam 70-741
- 大數據挑戰與NoSQL數據庫技術
- 深度學習中的圖像分類與對抗技術
- ESP8266 Home Automation Projects
- Lightning Fast Animation in Element 3D
- Azure PowerShell Quick Start Guide
- 教育機器人的風口:全球發展現狀及趨勢
- Linux嵌入式系統開發
- 嵌入式GUI開發設計
- 中文版AutoCAD 2013高手速成
- Drupal高手建站技術手冊
- 實戰Windows Azure