官术网_书友最值得收藏!

  • The Data Science Workshop
  • Anthony So Thomas V. Joseph Robert Thas John Andrew Worsley Dr. Samuel Asare
  • 427字
  • 2021-06-11 18:27:22

Summary

This chapter introduced the topic of linear regression analysis using Python. We learned that regression analysis, in general, is a supervised machine learning or data science problem. We learned about the fundamentals of linear regression analysis, including the ideas behind the method of least squares. We also learned about how to use the pandas Python module to load and prepare data for exploration and analysis.

We explored how to create scatter graphs of bivariate data and how to fit a line of best fit through them. Along the way, we discovered the power of the statsmodels module in Python. We explored how to use it to define simple linear regression models and to solve the model for the relevant parameters. We also learned how to extend that to situations where the number of independent variables is more than one – multiple linear regressions. We investigated approaches by which we can transform a non-linear relation between a dependent and independent variable so that a non-linear problem can be handled using linear regression, introduced because of the transformation. We took a closer look at the statsmodels formula language. We learned how to use it to define a variety of linear models and to solve for their respective model parameters.

We continued to learn about the ideas underpinning model goodness of fit. We discussed the R-squared statistic as a measure of the goodness of fit for regression models. We followed our discussions with the basic concepts of statistical significance. We learned about how to validate a regression model globally using the F-statistic, which Python calculates for us. We also examined how to check for the statistical significance of inpidual model coefficients using t-tests and their associated p-values. We reviewed the assumptions of linear regression analysis and how they impact on the validity of any regression analysis work.

We will now move on from regression analysis, and Chapter 3, Binary Classification, and Chapter 4, Multiclass Classification with RandomForest, will discuss binary and multi-label classification, respectively. These chapters will introduce the techniques needed to handle supervised data science problems where the dependent variable is of the categorical data type.

Regression analysis will be revisited when the important topics of model performance improvement and interpretation are given a closer look later in the book. In Chapter 8, Hyperparameter Tuning, we will see how to use k-nearest neighbors and as another method for carrying out regression analysis. We will also be introduced to ridge regression, a linear regression method that is useful for situations where there are a large number of parameters.

主站蜘蛛池模板: 三江| 眉山市| 扎囊县| 驻马店市| 福建省| 青神县| 南昌市| 安塞县| 新津县| 横峰县| 博乐市| 孙吴县| 太仓市| 凌云县| 竹溪县| 楚雄市| 霍州市| 皮山县| 麻城市| 卓资县| 广宁县| 固阳县| 甘德县| 白玉县| 蒙山县| 池州市| 乌拉特前旗| 海晏县| 尼勒克县| 双桥区| 阿巴嘎旗| 福鼎市| 渭源县| 茶陵县| 翼城县| 化德县| 和田市| 莎车县| 阳西县| 丘北县| 米脂县|