- Big Data Analytics with Hadoop 3
- Sridhar Alla
- 193字
- 2021-06-25 21:26:19
Inner join
Inner join requires the left and right tables to have the same column. If you have duplicate or multiple copies of the keys on either the left or right side, the join will quickly blow up into a sort of cartesian join, taking a lot longer to complete than if designed correctly, to minimize the multiple keys:

We will consider the cities and temperatures only if the cityID has both records as shown in the following code:
private static class InnerJoinReducer
extends Reducer<Text, Text, Text, IntWritable> {
private IntWritable result = new IntWritable();
private Text cityName = new Text("Unknown");
public void reduce(Text key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
int n = 0;
for (Text val : values) {
String strVal = val.toString();
if (strVal.length() <=3)
{
sum += Integer.parseInt(strVal);
n +=1;
} else {
cityName = new Text(strVal);
}
}
if (n!=0 && cityName.toString().compareTo("Unknown") !=0) {
result.set(sum / n);
context.write(cityName, result);
}
}
}
The output will be as shown in the following code (without city-6 or Las Vegas, as shown earlier in original output):
Boston 22
New York 23
Chicago 23
Philadelphia 23
San Francisco 22
推薦閱讀
- 我的J2EE成功之路
- 21小時學通AutoCAD
- Spark編程基礎(Scala版)
- Learning Social Media Analytics with R
- Effective DevOps with AWS
- 統計學習理論與方法:R語言版
- Implementing Splunk 7(Third Edition)
- Microsoft System Center Confi guration Manager
- SMS 2003部署與操作深入指南
- 深度學習原理與 TensorFlow實踐
- 51單片機應用程序開發與實踐
- 工程地質地學信息遙感自動提取技術
- Flash CS3動畫制作融會貫通
- Outlook時間管理秘笈
- 網頁配色萬用寶典