- PySpark Cookbook
- Denny Lee Tomasz Drabas
- 242字
- 2021-06-18 19:06:40
.reduce(...) action
The reduce(f) action aggregates the elements of an RDD by f. The f function should be commutative and associative so that it can be computed correctly in parallel. Look at the following code:
# Calculate the total delays of flights
# between SEA (origin) and SFO (dest),
# convert delays column to int
# and summarize
flights\
.filter(lambda c: c[3] == 'SEA' and c[4] == 'SFO')\
.map(lambda c: int(c[1]))\
.reduce(lambda x, y: x + y)
This will produce the following result:
# Output
22293
We need to make an important note here, however. When using reduce(), the reducer function needs to be associative and commutative; that is, a change in the order of elements and operands does not change the result.
Associativity rule: (6 + 3) + 4 = 6 + (3 + 4)
Commutative rule: 6 + 3 + 4 = 4 + 3 + 6
Error can occur if you ignore the aforementioned rules.
As an example, see the following RDD (with one partition only!):
data_reduce = sc.parallelize([1, 2, .5, .1, 5, .2], 1)
Reducing data to pide the current result by the subsequent one, we would expect a value of 10:
works = data_reduce.reduce(lambda x, y: x / y)
Partitioning the data into three partitions will produce an incorrect result:
data_reduce = sc.parallelize([1, 2, .5, .1, 5, .2], 3) data_reduce.reduce(lambda x, y: x / y)
It will produce 0.004.
- 基于粒計算模型的圖像處理
- Building Modern Web Applications Using Angular
- C# 2012程序設計實踐教程 (清華電腦學堂)
- Mastering KnockoutJS
- 快速念咒:MySQL入門指南與進階實戰
- 程序設計基礎教程:C語言
- C#實踐教程(第2版)
- Vue.js應用測試
- OpenCV with Python Blueprints
- 深入實踐DDD:以DSL驅動復雜軟件開發
- 零基礎學HTML+CSS第2版
- C++ System Programming Cookbook
- Java 11 and 12:New Features
- Developing Java Applications with Spring and Spring Boot
- C/C++代碼調試的藝術