官术网_书友最值得收藏!

.mapPartitionsWithIndex(...) transformation

The mapPartitionsWithIndex(f) is similar to map but runs the f function separately on each partition and provides an index of the partition. It is useful to determine the data skew within partitions (check the following snippet):

# Source: https://stackoverflow.com/a/38957067/1100699
def partitionElementCount(idx, iterator):
count = 0
for _ in iterator:
count += 1
return idx, count

# Use mapPartitionsWithIndex to determine
flights.mapPartitionsWithIndex(partitionElementCount).collect()

The preceding code will produce the following result:

# Output
[0,
174293,
1,
174020,
2,
173849,
3,
174006,
4,
173864,
5,
174308,
6,
173620,
7,
173618]
主站蜘蛛池模板: 沈阳市| 贡觉县| 泽普县| 新野县| 漠河县| 西乌珠穆沁旗| 临湘市| 中方县| 基隆市| 同仁县| 永宁县| 加查县| 通渭县| 雅安市| 安龙县| 沁阳市| 泗水县| 溧阳市| 缙云县| 宁城县| 大兴区| 正定县| 遵义市| 名山县| 扎兰屯市| 衡山县| 鄄城县| 龙游县| 大姚县| 昭通市| 建瓯市| 万荣县| 黔西县| 澜沧| 大安市| 东莞市| 和平区| 三明市| 巢湖市| 阿勒泰市| 马边|