官术网_书友最值得收藏!

.union(...) transformation

The union(RDD) transformation returns a new RDD that is the union of the source and argument RDDs. Look at the following code snippet:

# Create `a` RDD of Washington airports
a = (
airports
.zipWithIndex()
.filter(lambda (row, idx): idx > 0)
.map(lambda (row, idx): row)
.filter(lambda c: c[1] == "WA")
)

# Create `b` RDD of British Columbia airports
b = (
airports
.zipWithIndex()
.filter(lambda (row, idx): idx > 0)
.map(lambda (row, idx): row)
.filter(lambda c: c[1] == "BC")
)

# Union WA and BC airports
a.union(b).collect()

This will generate the following output:

# Output
[[u'Bellingham', u'WA', u'USA', u'BLI'],
[u'Moses Lake', u'WA', u'USA', u'MWH'],
[u'Pasco', u'WA', u'USA', u'PSC'],
[u'Pullman', u'WA', u'USA', u'PUW'],
[u'Seattle', u'WA', u'USA', u'SEA'],
...
[u'Vancouver', u'BC', u'Canada', u'YVR'],
[u'Victoria', u'BC', u'Canada', u'YYJ'],
[u'Williams Lake', u'BC', u'Canada', u'YWL']]
主站蜘蛛池模板: 贡嘎县| 万安县| 临江市| 孝昌县| 恩平市| 沁阳市| 双江| 浦北县| 昔阳县| 遂溪县| 凤山市| 林州市| 迁西县| 衡阳市| 临朐县| 罗甸县| 定兴县| 平顶山市| 景东| 武宁县| 孟连| 茌平县| 句容市| 西昌市| 桑日县| 祥云县| 昂仁县| 横山县| 榕江县| 柞水县| 吉木萨尔县| 石城县| 广水市| 乐至县| 泽普县| 南江县| 通化县| 宜都市| 黄梅县| 辽阳县| 临漳县|