官术网_书友最值得收藏!

Full outer join

Full outer join gives all (matched and unmatched) rows from the tables at the left and right side of the join clause. We use this when we want to keep all of the rows from both tables. A full outer join returns all rows when there is a match in ONE of the tables. If used on tables with little in common, it can result in very large results, and thus, slow performance:

We will consider the cities and temperatures only if the cityID has both records, or if it exists in one of the tables, as shown in the following code:

private static class FullOuterJoinReducer
extends Reducer<Text, Text, Text, IntWritable> {
private IntWritable result = new IntWritable();
private Text cityName = new Text("Unknown");
public void reduce(Text key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
int n = 0;

for (Text val : values) {
String strVal = val.toString();
if (strVal.length() <=3)
{
sum += Integer.parseInt(strVal);
n +=1;
} else {
cityName = new Text(strVal);
}
}
if (n==0) n = 1;
result.set(sum/n);
context.write(cityName, result);
}
}

The output will be as follows:

Boston 22
New York 23
Chicago 23
Philadelphia 23
San Francisco 22
city-6 22 //city ID 6 has no name in cities.csv only temperature measurements
Las Vegas 0 // city of Las vegas has no temperature measurements in temperature.csv
主站蜘蛛池模板: 祁东县| 曲阳县| 三江| 镇宁| 余姚市| 大新县| 石屏县| 呼图壁县| 信阳市| 嘉善县| 荆州市| 松溪县| 商水县| 安徽省| 宜阳县| 仪陇县| 旺苍县| 湖北省| 泗洪县| 健康| 黄大仙区| 商洛市| 武功县| 布拖县| 九江市| 黄梅县| 玉屏| 贵港市| 大丰市| 湟源县| 江西省| 玛曲县| 广灵县| 松溪县| 韶关市| 太湖县| 乃东县| 抚远县| 自治县| 汪清县| 乌鲁木齐市|