官术网_书友最值得收藏!

Pima Indians Diabetes

Diabetes is a health hazard, which is mostly incurable, and patients who are diagnosed with it have to adjust their lifestyles in order to cater to this condition. Based on variables such as pregnant, glucose, pressure, triceps, insulin, mass, pedigree, and age, the problem here is to classify the person as diabetic or not. Here, we have 768 observations. This dataset is drawn from the mlbench package:

> data("PimaIndiansDiabetes")
> set.seed(12345)
> Train_Test <- sample(c("Train","Test"),nrow(PimaIndiansDiabetes),replace = TRUE,
+ prob = c(0.7,0.3))
> head(Train_Test)
[1] "Test"  "Test"  "Test"  "Test"  "Train" "Train"
> PimaIndiansDiabetes_Train <- PimaIndiansDiabetes[Train_Test=="Train",]
> PimaIndiansDiabetes_TestX <- within(PimaIndiansDiabetes[Train_Test=="Test",],
+                                     rm(diabetes))
> PimaIndiansDiabetes_TestY <- PimaIndiansDiabetes[Train_Test=="Test","diabetes"]
> PID_Formula <- as.formula("diabetes~.")

The five datasets described up to this point are classification problems. We look at one example each for regression, time series, survival, clustering, and outlier detection problems.

主站蜘蛛池模板: 安乡县| 三河市| 景德镇市| 长乐市| 措美县| 永昌县| 同德县| 尼玛县| 西和县| 师宗县| 开封县| 宁南县| 东源县| 松溪县| 曲沃县| 长子县| 桂东县| 蚌埠市| 日喀则市| 湖口县| 遵义市| 定边县| 哈尔滨市| 仪陇县| 南宫市| 平潭县| 明溪县| 厦门市| 泰宁县| 弋阳县| 新沂市| 杭锦后旗| 龙井市| 巩义市| 蚌埠市| 邯郸市| 南郑县| 东源县| 福建省| 长治市| 稻城县|