官术网_书友最值得收藏!

Chapter 3. Multiple Regression in Action

In the previous chapter, we introduced linear regression as a supervised method for machine learning rooted in statistics. Such a method forecasts numeric values using a combination of predictors, which can be continuous numeric values or binary variables, given the assumption that the data we have at hand displays a certain relation (a linear one, measurable by a correlation) with the target variable. To smoothly introduce many concepts and easily explain how the method works, we limited our example models to just a single predictor variable, leaving to it all the burden of modeling the response.

However, in real-world applications, there may be some very important causes determining the events you want to model but it is indeed rare that a single variable could take the stage alone and make a working predictive model. The world is complex (and indeed interrelated in a mix of causes and effects) and often it cannot be easily explained without considering various causes, influencers, and hints. Usually more variables have to work together to achieve better and reliable results from a prediction.

Such an intuition decisively affects the complexity of our model, which from this point forth will no longer be easily represented on a two-dimensional plot. Given multiple predictors, each of them will constitute a dimension of its own and we will have to consider that our predictors are not just related to the response but also related among themselves (sometimes very strictly), a characteristic of data called multicollinearity.

Before starting, we'd like to write just a few words on the selection of topics we are going to deal with. Though in the statistical literature there is a large number of publications and books devoted to regression assumptions and diagnostics, you'll hardly find anything here because we will leave out such topics. We will be limiting ourselves to discussing problems and aspects that could affect the results of a regression model, on the basis of a practical data science approach, not a purely statistical one.

Given such premises, in this chapter we are going to:

  • Extend the procedures for a single regression to a multiple one, keeping an eye on possible sources of trouble such as multicollinearity
  • Understand the importance of each term in your linear model equation
  • Make your variables work together and increase your ability to predict using interactions between variables
  • Leverage polynomial expansions to increase the fit of your linear model with non-linear functions
主站蜘蛛池模板: 龙井市| 江口县| 浦县| 黔东| 互助| 满城县| 昌图县| 卢湾区| 汶上县| 乌兰察布市| 长沙市| 河源市| 兰西县| 托克托县| 扎赉特旗| 吴堡县| 嘉义县| 治县。| 鄂尔多斯市| 北京市| 漯河市| 容城县| 通城县| 梅河口市| 陇川县| 海丰县| 南郑县| 嘉善县| 双牌县| 道真| 开鲁县| 临潭县| 如东县| 封丘县| 贵港市| 闽清县| 慈溪市| 万安县| 汕尾市| 沂水县| 菏泽市|