- PySpark Cookbook
- Denny Lee Tomasz Drabas
- 93字
- 2021-06-18 19:06:39
.mapPartitionsWithIndex(...) transformation
The mapPartitionsWithIndex(f) is similar to map but runs the f function separately on each partition and provides an index of the partition. It is useful to determine the data skew within partitions (check the following snippet):
# Source: https://stackoverflow.com/a/38957067/1100699
def partitionElementCount(idx, iterator):
count = 0
for _ in iterator:
count += 1
return idx, count
# Use mapPartitionsWithIndex to determine
flights.mapPartitionsWithIndex(partitionElementCount).collect()
The preceding code will produce the following result:
# Output
[0,
174293,
1,
174020,
2,
173849,
3,
174006,
4,
173864,
5,
174308,
6,
173620,
7,
173618]
推薦閱讀
- Clojure Programming Cookbook
- Learning Python Web Penetration Testing
- AngularJS Testing Cookbook
- JavaScript:Functional Programming for JavaScript Developers
- Mastering Concurrency in Go
- VMware vSphere 6.7虛擬化架構實戰指南
- Oracle BAM 11gR1 Handbook
- The DevOps 2.4 Toolkit
- 青少年學Python(第1冊)
- Linux C編程:一站式學習
- Visual Studio Code 權威指南
- Zabbix Performance Tuning
- Solutions Architect's Handbook
- Enterprise Application Architecture with .NET Core
- Visual FoxPro程序設計實驗教程