官术网_书友最值得收藏!

Shuffle and sort

Once the mappers are done with the input data processing (essentially, splitting the data and generating key/value pairs), the output has to be distributed across the cluster to start the reduce tasks. Hence, a reduce task starts with the shuffle and sort step, by taking the output files written by all of the mappers and subsequent partitioners and downloads them to the local machine in which the reducer task is running. These inpidual data pieces are then sorted by key into one larger list of key/value pairs. The purpose of this sort is to group equivalent keys together, so that their values can be iterated over easily in the reduce task. The framework handles everything automatically, with the ability for the custom code to control how the keys are sorted and grouped.

主站蜘蛛池模板: 临湘市| 九龙坡区| 资兴市| 饶平县| 固始县| 清丰县| 永福县| 竹溪县| 新建县| 元江| 星座| 焦作市| 突泉县| 麻城市| 根河市| 永仁县| 云浮市| 文化| 成都市| 南平市| 共和县| 绥棱县| 云安县| 察雅县| 廉江市| 天气| 定兴县| 正阳县| 丰城市| 沅江市| 延安市| 长岭县| 宜春市| 姜堰市| 黄梅县| 钟山县| 壤塘县| 克拉玛依市| 呼图壁县| 莱芜市| 烟台市|