官术网_书友最值得收藏!

  • PySpark Cookbook
  • Denny Lee Tomasz Drabas
  • 104字
  • 2021-06-18 19:06:39

.reduceByKey(...) transformation

The reduceByKey(f) transformation reduces the elements of the RDD using f by the key. The f function should be commutative and associative so that it can be computed correctly in parallel.

Look at the following code snippet:

# Determine delays by originating city
# - remove header row via zipWithIndex()
# and map()
(
flights
.zipWithIndex()
.filter(lambda (row, idx): idx > 0)
.map(lambda (row, idx): row)
.map(lambda c: (c[3], int(c[1])))
.reduceByKey(lambda x, y: x + y)
.take(5)
)

This will generate the following output:

# Output
[(u'JFK', 387929),
(u'MIA', 169373),
(u'LIH', -646),
(u'LIT', 34489),
(u'RDM', 3445)]
主站蜘蛛池模板: 冷水江市| 保靖县| 剑阁县| 湟源县| 钟山县| 临汾市| 吴川市| 两当县| 法库县| 巢湖市| 南江县| 特克斯县| 页游| 桐庐县| 郧西县| 获嘉县| 西充县| 绥江县| 张掖市| 竹北市| 蛟河市| 呼伦贝尔市| 新丰县| 当涂县| 南溪县| 太湖县| 德安县| 靖宇县| 大宁县| 东安县| 金昌市| 山东省| 大庆市| 永善县| 株洲县| 金寨县| 道孚县| 绩溪县| 洪江市| 仙游县| 革吉县|