- Big Data Analytics with Hadoop 3
- Sridhar Alla
- 231字
- 2021-06-25 21:26:19
Full outer join
Full outer join gives all (matched and unmatched) rows from the tables at the left and right side of the join clause. We use this when we want to keep all of the rows from both tables. A full outer join returns all rows when there is a match in ONE of the tables. If used on tables with little in common, it can result in very large results, and thus, slow performance:

We will consider the cities and temperatures only if the cityID has both records, or if it exists in one of the tables, as shown in the following code:
private static class FullOuterJoinReducer
extends Reducer<Text, Text, Text, IntWritable> {
private IntWritable result = new IntWritable();
private Text cityName = new Text("Unknown");
public void reduce(Text key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
int n = 0;
for (Text val : values) {
String strVal = val.toString();
if (strVal.length() <=3)
{
sum += Integer.parseInt(strVal);
n +=1;
} else {
cityName = new Text(strVal);
}
}
if (n==0) n = 1;
result.set(sum/n);
context.write(cityName, result);
}
}
The output will be as follows:
Boston 22
New York 23
Chicago 23
Philadelphia 23
San Francisco 22
city-6 22 //city ID 6 has no name in cities.csv only temperature measurements
Las Vegas 0 // city of Las vegas has no temperature measurements in temperature.csv
推薦閱讀
- 大數據技術與應用基礎
- Managing Mission:Critical Domains and DNS
- 傳感器技術實驗教程
- Hadoop 2.x Administration Cookbook
- 反饋系統:多學科視角(原書第2版)
- STM32嵌入式微控制器快速上手
- C語言寶典
- Lightning Fast Animation in Element 3D
- Hands-On Reactive Programming with Reactor
- 空間機械臂建模、規劃與控制
- Extending Ansible
- 統計挖掘與機器學習:大數據預測建模和分析技術(原書第3版)
- 中文版AutoCAD 2013高手速成
- 筆記本電腦維修之電路分析基礎
- 大數據素質讀本