- PySpark Cookbook
- Denny Lee Tomasz Drabas
- 104字
- 2021-06-18 19:06:39
.reduceByKey(...) transformation
The reduceByKey(f) transformation reduces the elements of the RDD using f by the key. The f function should be commutative and associative so that it can be computed correctly in parallel.
Look at the following code snippet:
# Determine delays by originating city
# - remove header row via zipWithIndex()
# and map()
(
flights
.zipWithIndex()
.filter(lambda (row, idx): idx > 0)
.map(lambda (row, idx): row)
.map(lambda c: (c[3], int(c[1])))
.reduceByKey(lambda x, y: x + y)
.take(5)
)
This will generate the following output:
# Output
[(u'JFK', 387929),
(u'MIA', 169373),
(u'LIH', -646),
(u'LIT', 34489),
(u'RDM', 3445)]
推薦閱讀
- Unity 2020 Mobile Game Development
- Blockly創(chuàng)意趣味編程
- 深入RabbitMQ
- Java Web程序設(shè)計(jì)任務(wù)教程
- PLC應(yīng)用技術(shù)(三菱FX2N系列)
- Swift 4 Protocol-Oriented Programming(Third Edition)
- Unity 3D/2D移動(dòng)開發(fā)實(shí)戰(zhàn)教程
- Kubernetes源碼剖析
- 開源項(xiàng)目成功之道
- Hands-On Kubernetes on Windows
- uni-app跨平臺(tái)開發(fā)與應(yīng)用從入門到實(shí)踐
- Bootstrap for Rails
- 現(xiàn)代C:概念剖析和編程實(shí)踐
- Getting Started with Electronic Projects
- JavaScript Concurrency