官术网_书友最值得收藏!

Applying SQL table joins

In order to examine the table joins, we have created some additional test data. Let's consider banking data. We have an account table called account_data.json and a customer data table called client_data.json. So let's take a look at the two JSON files.

First, let's look at client.json:

Next, let's look at account.json:

As you can see, clientId of account.json refers to id of client.json. Therefore, we are able to join the two files but before we can do this, we have to load them:

var client = spark.read.json("client.json")
var account = spark.read.json("account.json")

Then we register these two DataFrames as temporary tables:

client.createOrReplaceTempView("client")
account.createOrReplaceTempView("account")

Let's query these individually, client first:

Then follow it up with account:

Now we can join the two tables:

Finally, let's calculate some aggregation on the amount of money that every client has on all his accounts:

主站蜘蛛池模板: 汨罗市| 海丰县| 丰原市| 保山市| 沂源县| 贵溪市| 类乌齐县| 平阳县| 宁陵县| 阆中市| 彰化市| 泽州县| 增城市| 河北省| 玛纳斯县| 泰州市| 长武县| 荆州市| 晋州市| 山东| 新晃| 固镇县| 鄂尔多斯市| 乳源| 海原县| 桃园市| 剑阁县| 类乌齐县| 霍邱县| 鄂托克前旗| 九龙县| 安陆市| 北票市| 长顺县| 库伦旗| 怀远县| 昆明市| 靖宇县| 万源市| 涟水县| 肃宁县|