官术网_书友最值得收藏!

Iris

Iris is probably the most famous classification dataset. The great statistician Sir R. A. Fisher popularized the dataset, which he used for classifying the three types of iris plants based on length and width measurements of their petals and sepals. Fisher used this dataset to pioneer the invention of the statistical classifier linear discriminant analysis. Since there are three species of iris, we converted this into a binary classification problem, separated the dataset, and created a formula as seen here:

> data("iris")
> ir2 <- iris
> ir2$Species <- ifelse(ir2$Species=="setosa","S","NS")
> ir2$Species <- as.factor(ir2$Species)
> set.seed(12345)
> Train_Test <- sample(c("Train","Test"),nrow(ir2),replace = TRUE,prob=c(0.7,0.3))
> head(Train_Test)
[1] "Test"  "Test"  "Test"  "Test"  "Train" "Train"
> ir2_Train <- ir2[Train_Test=="Train",]
> ir2_TestX <- within(ir2[Train_Test=="Test",],rm(Species))
> ir2_TestY <- ir2[Train_Test=="Test","Species"]
> ir2_Formula <- as.formula("Species~.")
主站蜘蛛池模板: 古丈县| 大厂| 渭源县| 麻江县| 岳西县| 黔南| 尼勒克县| 浦北县| 宜都市| 长丰县| 平凉市| 且末县| 日土县| 余庆县| 田阳县| 秦皇岛市| 吉林省| 珠海市| 桦南县| 桦川县| 自治县| 嵊泗县| 康平县| 木兰县| 太谷县| 西昌市| 博白县| 沈丘县| 呼伦贝尔市| 汶川县| 东莞市| 宝清县| 黔西| 新建县| 星子县| 隆德县| 兴国县| 宜章县| 城口县| 奉节县| 方城县|