- Python Data Mining Quick Start Guide
- Nathan Greeneltch
- 404字
- 2021-06-24 15:19:48
Plotting and exploring data – harnessing the power of Seaborn
Now let's start our analysis with Seaborn's canned plotting routine called pairplot to visualize pairwise feature relationships. You can use this routine to hunt down relationships, candidates for groupings, possible outliers, and an intuition for what downstream strategies to investigate for analysis. Each off-diagonal cell is a pairwise scatter plot and the diagonals are filled with univariate distributions:
# explore with Seaborn pairplot
import seaborn as sns
sns.pairplot(df,hue='species')
You will see the following output after executing the preceding code:

Sometimes, a histogram is easier to use than probability-density plots for understanding a distribution. With Seaborn, we can easily pass the diag_kind arg and re-plot it to view the histograms in the diagonals.
Also, we can change the aesthetics with palette and marker args. You can refer to the Seaborn documentation for more available args; let's do the re-plot as follows:
# add histograms to diagonals of Seaborn pairplot
sns.pairplot(df,hue='species',diag_kind='hist',
palette='bright',markers=['o','x','v'])
You will see the following output after executing the preceding code:

At this point, we can choose two variables and plot them in a scatter plot with Seaborn's lmplot. If your dataset has more than five features, important variable relationships may not be shown on the same window of the pair plot. You can use this bivariate scatter plot to isolate and view important pairings:
# plot bivariate scatter with Seaborn
sns.lmplot(x='petal length in cm', y='petal width in cm',
hue="species", data=df, fit_reg=False,
palette='bright',markers=['o','x','v'])
You will see the following output after executing the preceding code:

A popular quick-view of a single feature vector is a violin plot. Many practitioners prefer violins for understanding raw value distributions and class spreads on a single plot. Each violin is actually the univariate distribution, displayed as probability density, of the values within a given class plotted vertically like a box plot. This concept probably sounds convoluted, but one look at the plot should get the idea across with ease, and that's the idea. The more violin plots you see, the more you will learn to love them:
sns.violinplot(x='species',y='petal length in cm', data=df)
You will see the following output after executing the preceding code:

- JavaScript實(shí)例自學(xué)手冊
- Managing Mission:Critical Domains and DNS
- 返璞歸真:UNIX技術(shù)內(nèi)幕
- PHP開發(fā)手冊
- 自動(dòng)生產(chǎn)線的拆裝與調(diào)試
- 完全掌握AutoCAD 2008中文版:綜合篇
- 數(shù)據(jù)庫系統(tǒng)原理及應(yīng)用教程(第5版)
- 大學(xué)C/C++語言程序設(shè)計(jì)基礎(chǔ)
- Nginx高性能Web服務(wù)器詳解
- PostgreSQL 10 Administration Cookbook
- Web璀璨:Silverlight應(yīng)用技術(shù)完全指南
- Natural Language Processing and Computational Linguistics
- Red Hat Enterprise Linux 5.0服務(wù)器構(gòu)建與故障排除
- Eclipse RCP應(yīng)用系統(tǒng)開發(fā)方法與實(shí)戰(zhàn)
- Win 7二十一