書名： Big Data Analytics with Hadoop 3
作者名： Sridhar Alla
本章字數： 193字
更新時間： 2021-06-25 21:26:19

Inner join

Inner join requires the left and right tables to have the same column. If you have duplicate or multiple copies of the keys on either the left or right side, the join will quickly blow up into a sort of cartesian join, taking a lot longer to complete than if designed correctly, to minimize the multiple keys:

We will consider the cities and temperatures only if the cityID has both records as shown in the following code:

private static class InnerJoinReducer
        extends Reducer<Text, Text, Text, IntWritable> {
    private IntWritable result = new IntWritable();
    private Text cityName = new Text("Unknown");
    public void reduce(Text key, Iterable<Text> values,
                       Context context) throws IOException, InterruptedException {
        int sum = 0;
        int n = 0;
        
        for (Text val : values) {
            String strVal = val.toString();
            if (strVal.length() <=3)
            {
                sum += Integer.parseInt(strVal);
                n +=1;
            } else {
                cityName = new Text(strVal);
            }
        }
        if (n!=0 && cityName.toString().compareTo("Unknown") !=0) {
            result.set(sum / n);
            context.write(cityName, result);
        }
    }
}

The output will be as shown in the following code (without city-6 or Las Vegas, as shown earlier in original output):

Boston 22
New York 23
Chicago 23
Philadelphia 23
San Francisco 22

官术网_书友最值得收藏!

Big Data Analytics with Hadoop 3

Inner join