官术网_书友最值得收藏!

Full outer join

Full outer join gives all (matched and unmatched) rows from the tables at the left and right side of the join clause. We use this when we want to keep all of the rows from both tables. A full outer join returns all rows when there is a match in ONE of the tables. If used on tables with little in common, it can result in very large results, and thus, slow performance:

We will consider the cities and temperatures only if the cityID has both records, or if it exists in one of the tables, as shown in the following code:

private static class FullOuterJoinReducer
extends Reducer<Text, Text, Text, IntWritable> {
private IntWritable result = new IntWritable();
private Text cityName = new Text("Unknown");
public void reduce(Text key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
int n = 0;

for (Text val : values) {
String strVal = val.toString();
if (strVal.length() <=3)
{
sum += Integer.parseInt(strVal);
n +=1;
} else {
cityName = new Text(strVal);
}
}
if (n==0) n = 1;
result.set(sum/n);
context.write(cityName, result);
}
}

The output will be as follows:

Boston 22
New York 23
Chicago 23
Philadelphia 23
San Francisco 22
city-6 22 //city ID 6 has no name in cities.csv only temperature measurements
Las Vegas 0 // city of Las vegas has no temperature measurements in temperature.csv
主站蜘蛛池模板: 屏边| 资中县| 浦江县| 怀柔区| 博爱县| 泗水县| 太谷县| 呼和浩特市| 仁寿县| 松滋市| 长海县| 井研县| 定陶县| 辽源市| 景德镇市| 凤翔县| 广州市| 南部县| 平果县| 新丰县| 信丰县| 辽阳县| 道真| 芜湖县| 淳安县| 准格尔旗| 民丰县| 鲁山县| 临江市| 句容市| 大城县| 萨嘎县| 剑阁县| 峨边| 湖北省| 兴和县| 南康市| 布拖县| 辽阳县| 和顺县| 绥芬河市|