書名： PySpark Cookbook
作者名： Denny Lee Tomasz Drabas
本章字數： 91字
更新時間： 2021-06-18 19:06:38

.repartition(...) transformation

The repartition(n) transformation repartitions the RDD into n partitions by randomly reshuffling and uniformly distributing data across the network. As noted in the preceding recipes, this can improve performance by running more parallel threads concurrently. Here's a code snippet that does precisely that:

# The flights RDD originally generated has 2 partitions 
flights.getNumPartitions()

# Output
2 

# Let's re-partition this to 8 so we can have 8 
# partitions
flights2 = flights.repartition(8)

# Checking the number of partitions for the flights2 RDD
flights2.getNumPartitions()

# Output
8

官术网_书友最值得收藏!

PySpark Cookbook

.repartition(...) transformation