官术网_书友最值得收藏!

Perform an action - count by value

Finally, we're going to perform an action on our RDD. So far we've transformed the RDD into the form that we want. We took our raw input data and created an RDD that contains nothing but ratings as its values. Now we can perform an action with this line of code:

result = ratings.countByValue() 

What we're doing is calling our ratings RDD, which includes just the rating values in our example, 3, 3, 1, 2, and 1. Then we call an action method on that RDD, countByValue. This is a very easy way to cheat and quickly create something like a histogram:

What it does is count up how many times each unique value in the RDD occurs. In this particular example, we know that the rating 3 occurs twice, the rating 1 occurs twice, and the rating 2 only occurs once-this is the output we'll get. We get these pair values, these tuples if you will, of rating and then the number of times that occurred:

This is what will end up in our result object. All that's left to do at this point is to print that out. Now, countByValue is an action, so it's actually returning just a plain old Python object at this point, that's no longer an RDD. We can do what we want to do in order to sort those results, which is the final thing we do.

主站蜘蛛池模板: 闸北区| 当雄县| 大同县| 禄丰县| 庆阳市| 汉寿县| 安阳市| 赣州市| 台州市| 建德市| 乌鲁木齐县| 华安县| 苏尼特右旗| 闻喜县| 花莲市| 惠来县| 彭水| 玉树县| 万宁市| 蛟河市| 安化县| 册亨县| 勃利县| 靖边县| 临泉县| 桦川县| 湖南省| 哈巴河县| 泉州市| 绥江县| 乌拉特中旗| 宁陕县| 太保市| 黔西| 中西区| 丰宁| 抚远县| 东乡县| 四会市| 砚山县| 光泽县|