- Python Machine Learning Blueprints
- Alexander Combs Michael Roman
- 352字
- 2021-07-02 13:49:39
Statsmodels
The first library we'll cover is the statsmodels library (http://statsmodels.sourceforge.net/). Statsmodels is a Python package that is well documented and developed for exploring data, estimating models, and running statistical tests. Let's use it here to build a simple linear regression model of the relationship between sepal length and sepal width for the setosa species.
First, let's visually inspect the relationship with a scatterplot:
fig, ax = plt.subplots(figsize=(7,7)) ax.scatter(df['sepal width (cm)'][:50], df['sepal length (cm)'][:50]) ax.set_ylabel('Sepal Length') ax.set_xlabel('Sepal Width') ax.set_title('Setosa Sepal Width vs. Sepal Length', fontsize=14, y=1.02)
The preceding code generates the following output:

So, we can see that there appears to be a positive linear relationship; that is, as the sepal width increases, the sepal length does as well. We'll next run a linear regression on the data using statsmodels to estimate the strength of that relationship:
import statsmodels.api as sm y = df['sepal length'][:50] x = df['sepal width'][:50] X = sm.add_constant(x) results = sm.OLS(y, X).fit() print results.summary()
The preceding code generates the following output:

In the preceding diagram, we have the results of our simple regression model. Since this is a linear regression, the model takes the format of Y = Β0+ Β1X, where B0 is the intercept and B1 is the regression coefficient. Here, the formula would be Sepal Length = 2.6447 + 0.6909 * Sepal Width. We can also see that the R2 for the model is a respectable 0.558, and the p-value, (Prob), is highly significant—at least for this species.
Let's now use the results object to plot our regression line:
fig, ax = plt.subplots(figsize=(7,7)) ax.plot(x, results.fittedvalues, label='regression line') ax.scatter(x, y, label='data point', color='r') ax.set_ylabel('Sepal Length') ax.set_xlabel('Sepal Width') ax.set_title('Setosa Sepal Width vs. Sepal Length', fontsize=14, y=1.02) ax.legend(loc=2)
The preceding code generates the following output:

By plotting results.fittedvalues, we can get the resulting regression line from our regression.
There are a number of other statistical functions and tests in the statsmodels package, and I invite you to explore them. It is an exceptionally useful package for standard statistical modeling in Python. Let's now move on to the king of Python machine learning packages: scikit-learn.
- 用“芯”探核:龍芯派開發實戰
- Arduino入門基礎教程
- Learning AngularJS Animations
- Mastering Delphi Programming:A Complete Reference Guide
- 電腦組裝、維護、維修全能一本通(全彩版)
- Artificial Intelligence Business:How you can profit from AI
- 計算機維修與維護技術速成
- 分布式微服務架構:原理與實戰
- Visual Media Processing Using Matlab Beginner's Guide
- 基于Proteus仿真的51單片機應用
- Istio服務網格技術解析與實踐
- Hands-On Deep Learning for Images with TensorFlow
- STM32自學筆記
- 計算機電路基礎(第2版)
- Blender for Video Production Quick Start Guide