- PySpark Cookbook
- Denny Lee Tomasz Drabas
- 91字
- 2021-06-18 19:06:38
.repartition(...) transformation
The repartition(n) transformation repartitions the RDD into n partitions by randomly reshuffling and uniformly distributing data across the network. As noted in the preceding recipes, this can improve performance by running more parallel threads concurrently. Here's a code snippet that does precisely that:
# The flights RDD originally generated has 2 partitions
flights.getNumPartitions()
# Output
2
# Let's re-partition this to 8 so we can have 8
# partitions
flights2 = flights.repartition(8)
# Checking the number of partitions for the flights2 RDD
flights2.getNumPartitions()
# Output
8
推薦閱讀
- Clojure Programming Cookbook
- Oracle Exadata性能優化
- 人人都是網站分析師:從分析師的視角理解網站和解讀數據
- ArcGIS By Example
- HDInsight Essentials(Second Edition)
- Julia高性能科學計算(第2版)
- Swift 4從零到精通iOS開發
- Practical Game Design with Unity and Playmaker
- Android移動應用項目化教程
- Angular Design Patterns
- C語言程序設計教程
- 從零開始學算法:基于Python
- 游戲設計的底層邏輯
- PHP程序設計高級教程
- Java并發編程深度解析與實戰