- PySpark Cookbook
- Denny Lee Tomasz Drabas
- 103字
- 2021-06-18 19:06:38
.join(...) transformation
The join(RDD') transformation returns an RDD of (key, (val_left, val_right)) when calling RDD (key, val_left) and RDD (key, val_right). Outer joins are supported through left outer join, right outer join, and full outer join.
Look at the following code snippet:
# Flights data
# e.g. (u'JFK', u'01010900')
flt = flights.map(lambda c: (c[3], c[0]))
# Airports data
# e.g. (u'JFK', u'NY')
air = airports.map(lambda c: (c[3], c[1]))
# Execute inner join between RDDs
flt.join(air).take(5)
This will give you the following result:
# Output
[(u'JFK', (u'01010900', u'NY')),
(u'JFK', (u'01011200', u'NY')),
(u'JFK', (u'01011900', u'NY')),
(u'JFK', (u'01011700', u'NY')),
(u'JFK', (u'01010800', u'NY'))]
推薦閱讀
- UNIX編程藝術(shù)
- Hyper-V 2016 Best Practices
- 垃圾回收的算法與實現(xiàn)
- Android開發(fā)精要
- Java面向?qū)ο笏枷肱c程序設(shè)計
- 動手玩轉(zhuǎn)Scratch3.0編程:人工智能科創(chuàng)教育指南
- x86匯編語言:從實模式到保護(hù)模式(第2版)
- Learning Apache Mahout Classification
- Linux命令行與shell腳本編程大全(第4版)
- JavaCAPS基礎(chǔ)、應(yīng)用與案例
- Kotlin開發(fā)教程(全2冊)
- 深入剖析Java虛擬機:源碼剖析與實例詳解(基礎(chǔ)卷)
- C++ Fundamentals
- Fastdata Processing with Spark
- 深入解析Java編譯器:源碼剖析與實例詳解