官术网_书友最值得收藏!

2.1.4 Python案例:多元線性回歸變量篩選

本節就向前回歸法的變量篩選進行演示,首先定義一個向前選擇的函數:


def forward_select(data, response):
    remaining = set(data.columns)
    remaining.remove(response)
    selected = []
    current_score, best_new_score = float('inf'), float('inf')
    while remaining:
        aic_with_candidates=[]
        for candidate in remaining:
            formula = "{} ~ {}".format(
                response,' + '.join(selected + [candidate]))
            aic = ols(formula=formula, data=data).fit().aic
            aic_with_candidates.append((aic, candidate))
        aic_with_candidates.sort(reverse=True)
        best_new_score, best_candidate=aic_with_candidates.pop()
        if current_score > best_new_score: 
            remaining.remove(best_candidate)
            selected.append(best_candidate)
            current_score = best_new_score
            print ('aic is {},continuing!'.format(current_score))
        else:        
            print ('forward selection over!')
            break
    formula = "{} ~ {} ".format(response,' + '.join(selected))
    print('final formula is {}'.format(formula))
    model = ols(formula=formula, data=data).fit()
    return(model)

我們在代碼中將赤池信息量(aic)作為變量選擇標準,該值越小越好。利用這個函數,我們對收入、年齡、地區平均房價、地區平均收入這幾個自變量進行篩選:


data_for_select = train[['avg_exp', 'Income', 'Age', 'dist_home_val', 
                         'dist_avg_income']]
forward_select_model = forward_select(data=data_for_select, response='avg_exp')
print(forward_select_model.rsquared)

輸出結果如下:


aic is 1007.6801413968115, continuing !
aic is 1005.4969816306302,continuing!
aic is 1005.2487355956046, continuing !
forward selection over !
final formula is avg_exp ~ dist_avg_income + Income + dist_home_val
0.5411512928411949

可以看到,aic降到了1005.25,算法最終刪除了地區平均收入,此時的擬合優度R2為0.541。

主站蜘蛛池模板: 黄骅市| 靖宇县| 宣威市| 青岛市| 石屏县| 伊通| 汉阴县| 临夏县| 福贡县| 克拉玛依市| 大城县| 淳安县| 镇赉县| 高阳县| 香河县| 神木县| 河源市| 河源市| 安福县| 喀喇沁旗| 屯门区| 盐津县| 简阳市| 乡宁县| 廊坊市| 永定县| 油尖旺区| 长顺县| 阳城县| 遂平县| 宁河县| 龙州县| 土默特右旗| 梁河县| 吉木萨尔县| 滨州市| 高邮市| 台南市| 治县。| 武功县| 介休市|