官术网_书友最值得收藏!

Task-level native optimization

MapReduce has added support for a native implementation of the map output collector. This new support can result in a performance improvement of about 30% or more, particularly for shuffle-intensive jobs.

The native library will build automatically with Pnative. Users may choose the new collector on a job-by-job basis by setting mapreduce.job.map.output.collector.class=org.apache.hadoop.mapred.
nativetask.NativeMapOutputCollectorDelegator in their job configuration. 

The basic idea is to be able to add a NativeMapOutputCollector in order to handle key/value pairs emitted by mapper. As a result of this sort, spill, and IFile serialization can all be done in native code. A preliminary test (on Xeon E5410, jdk6u24) showed promising results as follows:

  • sort is about 3-10 times faster than Java (only binary string compare is supported)
  • IFile serialization speed is about three times faster than Java: about 500 MB per second. If CRC32C hardware is used, things can get much faster in the range of 1 GB or higher per second
  • Merge code is not completed yet, so the test uses enough io.sort.mb to prevent mid-spill
主站蜘蛛池模板: 广州市| 扎鲁特旗| 仲巴县| 永靖县| 长乐市| 中山市| 高陵县| 大田县| 米脂县| 石首市| 陆川县| 安国市| 雷州市| 汾阳市| 彩票| 永平县| 张掖市| 开江县| 建昌县| 绥德县| 开化县| 安化县| 商都县| 个旧市| 微山县| 吴江市| 佛教| 汤原县| 四川省| 西丰县| 巴中市| 弥勒县| 张家口市| 石城县| 镇康县| 共和县| 双辽市| 南康市| 宽甸| 五常市| 林西县|