- Python Data Analysis(Second Edition)
- Armando Fandango
- 447字
- 2021-07-09 19:04:08
Joining DataFrames
To demonstrate joining, we will use two CSV files-dest.csv
and tips.csv
. The use case behind it is that we are running a taxi company. Every time a passenger is dropped off at his or her destination, we add a row to the dest.csv
file with the employee number of the driver and the destination:
EmpNr,Dest5,The Hague3,Amsterdam9,Rotterdam
Sometimes drivers get a tip, so we want that registered in the tips.csv
file (if this doesn't seem realistic, please feel free to come up with your own story):
EmpNr,Amount5,109,57,2.5
Database-like joins in Pandas can be done with either the merge()
function or the join()
DataFrame method. The join()
method joins onto indices by default, which might not be what you want. In SQL a relational database query language we have the inner join, left outer join, right outer join, and full outer join.
Note
An inner join selects rows from two tables, if and only if values match, for columns specified in the join condition. Outer joins do not require a match, and can potentially return more rows. More information on joins can be found at http://en.wikipedia.org/wiki/Join_%28SQL%29.
All these join types are supported by Pandas, but we will only take a look at inner joins and full outer joins:
- A join on the employee number with the
merge()
function is performed as follows:print("Merge() on key\n", pd.merge(dests, tips, on='EmpNr'))
This gives an inner join as the outcome:
Merge() on key EmpNr Dest Amount 0 5 The Hague 10 1 9 Rotterdam 5 [2 rows x 3 columns]
- Joining with the
join()
method requires providing suffixes for the left and right operands:print("Dests join() tips\n", dests.join(tips, lsuffix='Dest', rsuffix='Tips'))
This method call joins index values so that the result is different from an SQL inner join:
Dests join() tips EmpNrDest Dest EmpNrTips Amount 0 5 The Hague 5 10.0 1 3 Amsterdam 9 5.0 2 9 Rotterdam 7 2.5 [3 rows x 4 columns]
- An even more explicit way to execute an inner join with
merge()
is as follows:print("Inner join with merge()\n", pd.merge(dests, tips, how='inner'))
The output is as follows:
Inner join with merge() EmpNr Dest Amount 0 5 The Hague 10 1 9 Rotterdam 5 [2 rows x 3 columns]
To make this a full outer join requires only a small change:
print("Outer join\n", pd.merge(dests, tips, how='outer'))
The outer join adds rows with
NaN
values:Outer join EmpNr Dest Amount 0 5 The Hague 10.0 1 3 Amsterdam NaN 2 9 Rotterdam 5.0 3 7 NaN 2.5 [4 rows x 3 columns]
In a relational database query, these values would have been set to NULL
. The demo code is in the ch-03.ipynb
file of this book's code bundle.
- 程序員面試白皮書
- Python 3.7網(wǎng)絡(luò)爬蟲快速入門
- Oracle Database In-Memory(架構(gòu)與實(shí)踐)
- Architecting the Industrial Internet
- PHP網(wǎng)絡(luò)編程學(xué)習(xí)筆記
- Blockly創(chuàng)意趣味編程
- Building Wireless Sensor Networks Using Arduino
- INSTANT Yii 1.1 Application Development Starter
- 微信小程序開發(fā)實(shí)戰(zhàn):設(shè)計(jì)·運(yùn)營(yíng)·變現(xiàn)(圖解案例版)
- SQL Server 2008中文版項(xiàng)目教程(第3版)
- Node.js區(qū)塊鏈開發(fā)
- Angular Design Patterns
- C語(yǔ)言程序設(shè)計(jì)教程
- Linux Networking Cookbook
- Learning Node.js for Mobile Application Development