官术网_书友最值得收藏!

  • PySpark Cookbook
  • Denny Lee Tomasz Drabas
  • 103字
  • 2021-06-18 19:06:38

.join(...) transformation

The join(RDD') transformation returns an RDD of (key, (val_left, val_right)) when calling RDD (key, val_left) and RDD (key, val_right). Outer joins are supported through left outer join, right outer join, and full outer join. 

Look at the following code snippet:

# Flights data
# e.g. (u'JFK', u'01010900')
flt = flights.map(lambda c: (c[3], c[0]))

# Airports data
# e.g. (u'JFK', u'NY')
air = airports.map(lambda c: (c[3], c[1]))

# Execute inner join between RDDs
flt.join(air).take(5)

This will give you the following result:

# Output
[(u'JFK', (u'01010900', u'NY')),
(u'JFK', (u'01011200', u'NY')),
(u'JFK', (u'01011900', u'NY')),
(u'JFK', (u'01011700', u'NY')),
(u'JFK', (u'01010800', u'NY'))]
主站蜘蛛池模板: 英超| 西充县| 鸡西市| 康乐县| 大姚县| 凤山县| 福建省| 青铜峡市| 时尚| 新晃| 武定县| 伽师县| 铅山县| 利津县| 嘉祥县| 登封市| 永泰县| 六盘水市| 日照市| 凤庆县| 永顺县| 花莲市| 左权县| 英超| 兴宁市| 鸡东县| 辉县市| 万年县| 河津市| 永和县| 六盘水市| 新闻| 承德县| 东城区| 沧州市| 河曲县| 达孜县| 香河县| 邵武市| 龙海市| 庆安县|