- Big Data Analytics with Hadoop 3
- Sridhar Alla
- 136字
- 2021-06-25 21:26:15
Shuffle and sort
Once the mappers are done with the input data processing (essentially, splitting the data and generating key/value pairs), the output has to be distributed across the cluster to start the reduce tasks. Hence, a reduce task starts with the shuffle and sort step, by taking the output files written by all of the mappers and subsequent partitioners and downloads them to the local machine in which the reducer task is running. These inpidual data pieces are then sorted by key into one larger list of key/value pairs. The purpose of this sort is to group equivalent keys together, so that their values can be iterated over easily in the reduce task. The framework handles everything automatically, with the ability for the custom code to control how the keys are sorted and grouped.
- 繪制進(jìn)程圖:可視化D++語言(第1冊)
- Hands-On Cloud Solutions with Azure
- Mastering D3.js
- 模型制作
- 基于ARM 32位高速嵌入式微控制器
- 統(tǒng)計(jì)學(xué)習(xí)理論與方法:R語言版
- R Machine Learning Projects
- 中文版AutoCAD 2013高手速成
- HBase Essentials
- 傳感器原理與工程應(yīng)用
- Effective Business Intelligence with QuickSight
- 人工智能云平臺(tái):原理、設(shè)計(jì)與應(yīng)用
- WPF專業(yè)編程指南
- Microsoft System Center Data Protection Manager Cookbook
- 51單片機(jī)應(yīng)用程序開發(fā)與實(shí)踐