- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 144字
- 2021-07-02 18:55:30
Applying SQL table joins
In order to examine the table joins, we have created some additional test data. Let's consider banking data. We have an account table called account_data.json and a customer data table called client_data.json. So let's take a look at the two JSON files.
First, let's look at client.json:

Next, let's look at account.json:

As you can see, clientId of account.json refers to id of client.json. Therefore, we are able to join the two files but before we can do this, we have to load them:
var client = spark.read.json("client.json")
var account = spark.read.json("account.json")
Then we register these two DataFrames as temporary tables:
client.createOrReplaceTempView("client")
account.createOrReplaceTempView("account")
Let's query these individually, client first:

Then follow it up with account:

Now we can join the two tables:

Finally, let's calculate some aggregation on the amount of money that every client has on all his accounts:

- Python概率統計
- Web Development with Django Cookbook
- 機械工程師Python編程:入門、實戰與進階
- Visual C
- Python算法指南:程序員經典算法分析與實現
- Python Data Structures and Algorithms
- iOS自動化測試實戰:基于Appium、Python與Pytest
- 寫給程序員的Python教程
- Spring 5 Design Patterns
- Go語言入門經典
- Web前端開發技術:HTML、CSS、JavaScript
- Java EE 8 and Angular
- Flink核心技術:源碼剖析與特性開發
- Node.js應用開發
- Isomorphic Go