官术网_书友最值得收藏!

  • Mastering MongoDB 3.x
  • Alex Giamas
  • 468字
  • 2021-08-20 10:10:56

Troubleshooting MapReduce

Throughout the years, one of the major shortcomings of MapReduce frameworks has been the inherent difficulty in troubleshooting as opposed to simpler non-distributed patterns. Most of the time, the most effective tool is debugging using log statements to verify that output values match our expected values. In the mongo shell, this being a JavaScript shell, this is as simple as outputting using the console.log() function.

Diving deeper into MapReduce in MongoDB we can debug both in the map and the reduce phase by overloading the output values.

Debugging the mapper phase, we can overload the emit() function to test what the output key values are:

> var emit = function(key, value) {
print("debugging mapper's emit");
print("key: " + key + " value: " + tojson(value));
}

We can then call it manually on a single document to verify that we get back the key-value pair that we would expect:

> var myDoc = db.orders.findOne( { _id: ObjectId("50a8240b927d5d8b5891743c") } );
> mapper.apply(myDoc);

The reducer function is somewhat more complicated. A MapReduce reducer function must meet the following criteria:

  • It must be idempotent
  • The order of values coming from the mapper function should not matter for the reducer's result
  • The reduce function must return the same type of result as the mapper function

We will dissect these following requirements to understand what they really mean:

  • It must be idempotent: MapReduce by design may call the reducer multiple times for the same key with multiple values from the mapper phase. It also doesn't need to reduce single instances of a key as it's just added to the set. The final value should be the same no matter the order of execution. This can be verified by writing our own "verifier" function forcing the reducer to re-reduce or by executing the reducer many, many times:
reduce( key, [ reduce(key, valuesArray) ] ) == reduce( key, valuesArray )
  • It must be commutative: Again, because multiple invocations of the reducer may happen for the same key, if it has multiple values, the following should hold:
reduce(key, [ C, reduce(key, [ A, B ]) ] ) == reduce( key, [ C, A, B ] )
  • The order of values coming from the mapper function should not matter for the reducer's result: We can test that the order of values from the mapper doesn't change the output for the reducer by passing in documents to the mapper in a different order and verifying that we get the same results out:
reduce( key, [ A, B ] ) == reduce( key, [ B, A ] )
  • The reduce function must return the same type of result as the mapper function: Hand-in-hand with the first requirement, the type of object that the reduce function returns should be the same as the output of the mapper function.
主站蜘蛛池模板: 浙江省| 盈江县| 武威市| 图片| 兴安县| 呼和浩特市| 临汾市| 全州县| 孟连| 宝山区| 红河县| 阜平县| 姚安县| 顺平县| 全椒县| 仙游县| 宁远县| 黔江区| 稻城县| 花莲县| 长春市| 阿巴嘎旗| 柘荣县| 恩施市| 福建省| 安吉县| 乐安县| 潮州市| 盐边县| 新乡市| 武隆县| 老河口市| 安福县| 保亭| 六枝特区| 龙山县| 齐河县| 哈尔滨市| 孟村| 缙云县| 桐梓县|