官术网_书友最值得收藏!

Polynomial regression

While in linear regression, the correlation between the independent and the dependent variables is best represented with a straight line, the real-life datasets are more complex and do not represent a linear relationship between cause and effect. The straight line equation does not fit the data points and hence cannot create an effective predictive model.

In such cases, we can consider using a higher-order quadratic equation for the predictor function. Given x as an independent variable and y as a dependent variable, the polynomial function takes the following forms:

These can be visualized with a small set of sample data as follows:

Figure 3.11 Polynomial prediction function

Note that the straight line cannot accurately represent the relationship between x and y. As we model the prediction function with higher-order functions, R2 is improved. This means the model is able to be more accurate.

We may think that it will be best to use the highest possible order equation for the prediction function in order to get the best fitting model. However, that is not right because as we create the regression line that goes through all the data points, the model fails to accurately predict the outcomes for any data outside of the training sample (test data). This problem is called overfitting. On the other end, we may also encounter the problem of underfitting. This is when the model does not fit the training data well and hence performs poorly with the test data.

主站蜘蛛池模板: 安义县| 永城市| 太白县| 仪征市| 新津县| 乐陵市| 明溪县| 东山县| 浏阳市| 东兰县| 靖宇县| 常宁市| 监利县| 平定县| 达拉特旗| 吴忠市| 柳江县| 金华市| 石泉县| 江门市| 颍上县| 杨浦区| 浙江省| 札达县| 申扎县| 莱西市| 桂平市| 三河市| 扶沟县| 岗巴县| 且末县| 山西省| 贡山| 商洛市| 汝州市| 丰顺县| 武夷山市| 开远市| 耿马| 杭锦后旗| 自贡市|