官术网_书友最值得收藏!

German Credit

Loans are not always repaid in full, and there are defaulters. In this case, it becomes important for the bank to identify potential defaulters based on the available information. Here, we adapt the GC dataset from the RSADBE package to properly reflect the labels of the factor variable. The transformed dataset is available as GC2.RData in the data folder. The GC dataset itself is mainly an adaptation of the version available at https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data). Here, we have 1,000 observations, and 20 covariate/independent variables such as the status of existing checking account, duration, and so forth. The final status of whether the loan was completely paid or not is available in the good_bad column. We will partition the data into training and testing parts, and create the formula too:

> library(RSADBE)
> load("../Data/GC2.RData")
> table(GC2$good_bad)
 bad good 
 300  700 
> set.seed(12345)
> Train_Test <- sample(c("Train","Test"),nrow(GC2),replace = TRUE,prob=c(0.7,0.3))
> head(Train_Test)
[1] "Test"  "Test"  "Test"  "Test"  "Train" "Train"
> GC2_Train <- GC2[Train_Test=="Train",]
> GC2_TestX <- within(GC2[Train_Test=="Test",],rm(good_bad))
> GC2_TestY <- GC2[Train_Test=="Test","good_bad"]
> GC2_Formula <- as.formula("good_bad~.")
主站蜘蛛池模板: 台湾省| 莆田市| 海晏县| 临沭县| 克拉玛依市| 闻喜县| 农安县| 新竹市| 洱源县| 浮梁县| 宜阳县| 景东| 兴仁县| 安阳市| 三原县| 莱州市| 宁城县| 宁阳县| 赞皇县| 新津县| 滦平县| 弋阳县| 仲巴县| 杭锦旗| 诸城市| 丹江口市| 黎城县| 汝州市| 汝阳县| 渑池县| 乃东县| 镇赉县| 枣强县| 大庆市| 青阳县| 广德县| 西城区| 屯留县| 雅安市| 华容县| 株洲市|