- Python Machine Learning Blueprints
- Alexander Combs Michael Roman
- 352字
- 2021-07-02 13:49:39
Statsmodels
The first library we'll cover is the statsmodels library (http://statsmodels.sourceforge.net/). Statsmodels is a Python package that is well documented and developed for exploring data, estimating models, and running statistical tests. Let's use it here to build a simple linear regression model of the relationship between sepal length and sepal width for the setosa species.
First, let's visually inspect the relationship with a scatterplot:
fig, ax = plt.subplots(figsize=(7,7)) ax.scatter(df['sepal width (cm)'][:50], df['sepal length (cm)'][:50]) ax.set_ylabel('Sepal Length') ax.set_xlabel('Sepal Width') ax.set_title('Setosa Sepal Width vs. Sepal Length', fontsize=14, y=1.02)
The preceding code generates the following output:

So, we can see that there appears to be a positive linear relationship; that is, as the sepal width increases, the sepal length does as well. We'll next run a linear regression on the data using statsmodels to estimate the strength of that relationship:
import statsmodels.api as sm y = df['sepal length'][:50] x = df['sepal width'][:50] X = sm.add_constant(x) results = sm.OLS(y, X).fit() print results.summary()
The preceding code generates the following output:

In the preceding diagram, we have the results of our simple regression model. Since this is a linear regression, the model takes the format of Y = Β0+ Β1X, where B0 is the intercept and B1 is the regression coefficient. Here, the formula would be Sepal Length = 2.6447 + 0.6909 * Sepal Width. We can also see that the R2 for the model is a respectable 0.558, and the p-value, (Prob), is highly significant—at least for this species.
Let's now use the results object to plot our regression line:
fig, ax = plt.subplots(figsize=(7,7)) ax.plot(x, results.fittedvalues, label='regression line') ax.scatter(x, y, label='data point', color='r') ax.set_ylabel('Sepal Length') ax.set_xlabel('Sepal Width') ax.set_title('Setosa Sepal Width vs. Sepal Length', fontsize=14, y=1.02) ax.legend(loc=2)
The preceding code generates the following output:

By plotting results.fittedvalues, we can get the resulting regression line from our regression.
There are a number of other statistical functions and tests in the statsmodels package, and I invite you to explore them. It is an exceptionally useful package for standard statistical modeling in Python. Let's now move on to the king of Python machine learning packages: scikit-learn.
- Aftershot Pro:Non-destructive photo editing and management
- 極簡Spring Cloud實戰
- Svelte 3 Up and Running
- Practical Machine Learning with R
- 筆記本電腦使用、維護與故障排除從入門到精通(第5版)
- 微型計算機系統原理及應用:國產龍芯處理器的軟件和硬件集成(基礎篇)
- 基于Proteus仿真的51單片機應用
- Spring Cloud微服務和分布式系統實踐
- 計算機電路基礎(第2版)
- 計算機組裝、維護與維修項目教程
- The Applied Artificial Intelligence Workshop
- 分布式存儲系統:核心技術、系統實現與Go項目實戰
- 計算機應用基礎案例教程(Windows 7+Office 2010)
- Machine Learning Projects for Mobile Applications
- 嵌入式系統原理:基于Arm Cortex-M微控制器體系