官术网_书友最值得收藏!

Interaction terms

Interaction terms are similarly easy to code in R. Two features interact if the effect on the prediction of one feature depends on the value of the other feature. This would follow the formulation, Y = B0 + B1x + B2x + B1B2x + e. An example is available in the MASS package with the Boston dataset. The response is the median home value, which is medv in the output. We will use two features: the percentage of homes with a low socioeconomic status, which is termed lstat, and the age of the home in years, which is termed age in the following output:

    > library(MASS)

> data(Boston)

> str(Boston)

'data.frame': 506 obs. of 14 variables:
$ crim : num 0.00632 0.02731 0.02729 0.03237
0.06905 ...

$ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5
...

$ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87
7.87 7.87 7.87
...

$ chas : int 0 0 0 0 0 0 0 0 0 0 ...
$ nox : num 0.538 0.469 0.469 0.458 0.458 0.458
0.524 0.524
0.524 0.524 ...

$ rm : num 6.58 6.42 7.18 7 7.15 ...
$ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6
96.1 100 85.9
...

$ dis : num 4.09 4.97 4.97 6.06 6.06 ...
$ rad : int 1 2 2 3 3 3 5 5 5 5 ...
$ tax : num 296 242 242 222 222 222 311 311 311
311 ...

$ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2
15.2 15.2 15.2
...

$ black : num 397 397 393 395 397 ...
$ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
$ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9
27.1 16.5 18.9 ...

Using feature1*feature2 with the lm() function in the code puts both the features as well as their interaction term in the model, as follows:

    > value.fit <- lm(medv ~ lstat * age, data = 
Boston)


> summary(value.fit)

Call:
lm(formula = medv ~ lstat * age, data = Boston)

Residuals:
Min 1Q Median 3Q Max
-15.806 -4.045 -1.333 2.085 27.552

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.0885359 1.4698355 24.553 < 2e-16
***

lstat -1.3921168 0.1674555 -8.313 8.78e-16
***

age -0.0007209 0.0198792 -0.036 0.9711
lstat:age 0.0041560 0.0018518 2.244 0.0252
*

---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1


Residual standard error: 6.149 on 502 degrees of
freedom

Multiple R-squared: 0.5557, Adjusted R-squared:
0.5531

F-statistic: 209.3 on 3 and 502 DF, p-value: <
2.2e-16

Examining the output, we can see that, while the socioeconomic status is a highly predictive feature, the age of the home is not. However, the two features have a significant interaction to positively explain the home value.

主站蜘蛛池模板: 上蔡县| 辽宁省| 崇州市| 通辽市| 互助| 达州市| 洞口县| 台江县| 东莞市| 南通市| 肥乡县| 二连浩特市| 新野县| 新乐市| 麻江县| 富源县| 淄博市| 丰城市| 建始县| 和静县| 西峡县| 会理县| 洛浦县| 山丹县| 大同市| 湖北省| 台江县| 文水县| 驻马店市| 星子县| 洱源县| 平遥县| 开封市| 虞城县| 广宁县| 万安县| 杭锦后旗| 泾源县| 黄山市| 房产| 盐源县|