- 金融商業算法建模:基于Python和SAS
- 趙仁乾 田建中 葉本華 常國珍
- 278字
- 2021-11-05 17:52:06
2.1.4 Python案例:多元線性回歸變量篩選
本節就向前回歸法的變量篩選進行演示,首先定義一個向前選擇的函數:
def forward_select(data, response): remaining = set(data.columns) remaining.remove(response) selected = [] current_score, best_new_score = float('inf'), float('inf') while remaining: aic_with_candidates=[] for candidate in remaining: formula = "{} ~ {}".format( response,' + '.join(selected + [candidate])) aic = ols(formula=formula, data=data).fit().aic aic_with_candidates.append((aic, candidate)) aic_with_candidates.sort(reverse=True) best_new_score, best_candidate=aic_with_candidates.pop() if current_score > best_new_score: remaining.remove(best_candidate) selected.append(best_candidate) current_score = best_new_score print ('aic is {},continuing!'.format(current_score)) else: print ('forward selection over!') break formula = "{} ~ {} ".format(response,' + '.join(selected)) print('final formula is {}'.format(formula)) model = ols(formula=formula, data=data).fit() return(model)
我們在代碼中將赤池信息量(aic)作為變量選擇標準,該值越小越好。利用這個函數,我們對收入、年齡、地區平均房價、地區平均收入這幾個自變量進行篩選:
data_for_select = train[['avg_exp', 'Income', 'Age', 'dist_home_val', 'dist_avg_income']] forward_select_model = forward_select(data=data_for_select, response='avg_exp') print(forward_select_model.rsquared)
輸出結果如下:
aic is 1007.6801413968115, continuing ! aic is 1005.4969816306302,continuing! aic is 1005.2487355956046, continuing ! forward selection over ! final formula is avg_exp ~ dist_avg_income + Income + dist_home_val 0.5411512928411949
可以看到,aic降到了1005.25,算法最終刪除了地區平均收入,此時的擬合優度R2為0.541。
推薦閱讀
- Lean Mobile App Development
- 大數據Hadoop 3.X分布式處理實戰
- Learning Proxmox VE
- 一個64位操作系統的設計與實現
- SQL優化最佳實踐:構建高效率Oracle數據庫的方法與技巧
- Apache Kylin權威指南
- IPython Interactive Computing and Visualization Cookbook(Second Edition)
- Construct 2 Game Development by Example
- Spark分布式處理實戰
- 區塊鏈應用開發指南:業務場景剖析與實戰
- Oracle 11g數據庫管理員指南
- NoSQL數據庫原理(第2版·微課版)
- 推薦系統全鏈路設計:原理解讀與業務實踐
- Trino權威指南(原書第2版)
- 一本書讀懂區塊鏈(第2版)