官术网_书友最值得收藏!

Example of multilinear regression - step-by-step methodology of model building

In this section, we actually show the approach followed by industry experts while modeling using linear regression with sample wine data. The statmodels.api package has been used for multiple linear regression demonstration purposes instead of scikit-learn, due to the fact that the former provides diagnostics on variables, whereas the latter only provides final accuracy, and so on:

>>> import numpy as np 
>>> import pandas as pd 
>>> import statsmodels.api as sm 
>>> import matplotlib.pyplot as plt 
>>> import seaborn as sns 
>>> from sklearn.model_selection import train_test_split     
>>> from sklearn.metrics import r2_score 
 
>>> wine_quality = pd.read_csv("winequality-red.csv",sep=';')   
# Step for converting white space in columns to _ value for better handling  
>>> wine_quality.rename(columns=lambda x: x.replace(" ", "_"), inplace=True) 
>>> eda_colnms = [ 'volatile_acidity',  'chlorides', 'sulphates', 'alcohol','quality'] 
# Plots - pair plots 
>>> sns.set(style='whitegrid',context = 'notebook') 

Pair plots for sample five variables are shown as follows; however, we encourage you to try various combinations to check various relationships visually between the various other variables:

>>> sns.pairplot(wine_quality[eda_colnms],size = 2.5,x_vars= eda_colnms, y_vars= eda_colnms) 
>>> plt.show() 

In addition to visual plots, correlation coefficients are calculated to show the level of correlation in numeric terminology; these charts are used to drop variables in the initial stage, if there are many of them to start with:

>>> # Correlation coefficients 
>>> corr_mat = np.corrcoef(wine_quality[eda_colnms].values.T) 
>>> sns.set(font_scale=1) 
>>> full_mat = sns.heatmap(corr_mat, cbar=True, annot=True, square=True, fmt='.2f',annot_kws={'size': 15}, yticklabels=eda_colnms, xticklabels=eda_colnms) 
>>> plt.show() 
主站蜘蛛池模板: 泸水县| 伊宁县| 拉萨市| 大厂| 富蕴县| 卢氏县| 衡水市| 雷波县| 内丘县| 樟树市| 博乐市| 桃园县| 海宁市| 茌平县| 怀宁县| 花垣县| 达日县| 泗阳县| 府谷县| 武冈市| 确山县| 乌兰浩特市| 冷水江市| 无为县| 汉源县| 嘉禾县| 锡林郭勒盟| 武宁县| 北流市| 抚松县| 仲巴县| 河源市| 灵川县| 通山县| 乐亭县| 芜湖县| 奎屯市| 长武县| 南阳市| 平谷区| 安溪县|