官术网_书友最值得收藏!

Logistic regression model

The logistic regression model is a binary classification model, and it is a member of the exponential family which belongs to the class of generalized linear models. Now, let Logistic regression modeldenote the binary label:

Logistic regression model

Using the information contained in the explanatory vector Logistic regression model we are trying to build a model that will help in this task. The logistic regression model is the following:

Logistic regression model

Here, Logistic regression model is the vector of regression coefficients. Note that the logit function Logistic regression model is linear in the regression coefficients and hence the name for the model is a logistic regression model. A logistic regression model can be equivalently written as follows:

Logistic regression model

Here, Logistic regression model is the binary error term that follows a Bernoulli distribution. For more information, refer to Chapter 17 of Tattar, et al. (2016). The estimation of the parameters of the logistic regression requires the iterative reweighted least squares (IRLS) algorithm, and we would use the glm R function to get this task done. We will use the Hypothyroid dataset in this section. In the previous section, the training and test datasets and formulas were already created, and we will carry on from that point.

Logistic regression for hypothyroid classification

For the hypothyroid dataset, we had HT2_Train as the training dataset. The test dataset is split as the covariate matrix in HT2_TestX and the outputs of the test dataset in HT2_TestY, while the formula for the logistic regression model is available in HT2_Formula. First, the logistic regression model is fitted to the training dataset using the glm function and the fitted model is christened LR_fit, and then we inspect it for model fit summaries using summary(LR_fit). The fitted model is then applied to the covariate data in the test part using the predict function to create LR_Predict. The predicted probabilities are then labeled in LR_Predict_Bin, and these labels are compared with the actual testY_numeric and overall accuracy is obtained:

> ntr <- nrow(HT2_Train) # Training size
> nte <- nrow(HT2_TestX) # Test size
> p <- ncol(HT2_TestX)
> testY_numeric <- as.numeric(HT2_TestY)
> LR_fit <- glm(HT2_Formula,data=HT2_Train,family = binomial())
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 
> summary(LR_fit)
Call:
glm(formula = HT2_Formula, family = binomial(), data = HT2_Train)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.6390   0.0076   0.0409   0.1068   3.5127  
Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -8.302025   2.365804  -3.509 0.000449 ***
Age         -0.024422   0.012145  -2.011 0.044334 *  
GenderMALE  -0.195656   0.464353  -0.421 0.673498    
TSH         -0.008457   0.007530  -1.123 0.261384    
T3           0.480986   0.347525   1.384 0.166348    
TT4         -0.089122   0.028401  -3.138 0.001701 ** 
T4U          3.932253   1.801588   2.183 0.029061 *  
FTI          0.197196   0.035123   5.614 1.97e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 609.00  on 1363  degrees of freedom
Residual deviance: 181.42  on 1356  degrees of freedom
AIC: 197.42
Number of Fisher Scoring iterations: 9
> LR_Predict <- predict(LR_fit,newdata=HT2_TestX,type="response")
> LR_Predict_Bin <- ifelse(LR_Predict>0.5,2,1)
> LR_Accuracy <- sum(LR_Predict_Bin==testY_numeric)/nte
> LR_Accuracy
[1] 0.9732704

It can be seen from the summary of the fitted GLM (the output following the line summary(LR_fit)) that we are having four significant variables in Age, TT4, T4U, and FTI. Using the predict function, we apply the fitted model on unknown test cases in HT2_TestX, compare it with the actuals, and find the accuracy to be 97.33%. Consequently, logistic regression is easily deployed in the R software.

主站蜘蛛池模板: 澎湖县| 内江市| 昌平区| 融水| 平阴县| 建昌县| 乐东| 永安市| 阿拉善盟| 通道| 漳州市| 铜川市| 阿克陶县| 淅川县| 东兴市| 巍山| 乌兰察布市| 高邑县| 上思县| 泽库县| 景宁| 健康| 潞城市| 兴业县| 许昌市| 恩施市| 济阳县| 兴城市| 蒲城县| 织金县| 丰顺县| 蒙城县| 深州市| 奎屯市| 洛扎县| 龙陵县| 竹山县| 肃宁县| 积石山| 漳平市| 渭源县|