官术网_书友最值得收藏!

Inner join

Inner join requires the left and right tables to have the same column. If you have duplicate or multiple copies of the keys on either the left or right side, the join will quickly blow up into a sort of cartesian join, taking a lot longer to complete than if designed correctly, to minimize the multiple keys:

We will consider the cities and temperatures only if the cityID has both records as shown in the following code:

private static class InnerJoinReducer
extends Reducer<Text, Text, Text, IntWritable> {
private IntWritable result = new IntWritable();
private Text cityName = new Text("Unknown");
public void reduce(Text key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
int n = 0;

for (Text val : values) {
String strVal = val.toString();
if (strVal.length() <=3)
{
sum += Integer.parseInt(strVal);
n +=1;
} else {
cityName = new Text(strVal);
}
}
if (n!=0 && cityName.toString().compareTo("Unknown") !=0) {
result.set(sum / n);
context.write(cityName, result);
}
}
}

The output will be as shown in the following code (without city-6 or Las Vegas, as shown earlier in original output):

Boston 22
New York 23
Chicago 23
Philadelphia 23
San Francisco 22
主站蜘蛛池模板: 彰化市| 宿迁市| 岫岩| 桦川县| 延长县| 凉山| 台中县| 屏东市| 达孜县| 邵阳市| 莱芜市| 铁岭市| 清水河县| 韩城市| 兴山县| 老河口市| 保定市| 嘉黎县| 神池县| 乃东县| 大渡口区| 东辽县| 江安县| 玉环县| 万源市| 昭通市| 霍林郭勒市| 鄯善县| 晋江市| 乡城县| 青川县| 元朗区| 扎鲁特旗| 涟水县| 昆山市| 花莲县| 桐梓县| 淮北市| 道真| 衡阳市| 重庆市|