官术网_书友最值得收藏!

Logistic regression – introduction and advantages

Logistic regression applies maximum likelihood estimation after transforming the dependent variable into a logit variable (natural log of the odds of the dependent variable occurring or not) with respect to independent variables. In this way, logistic regression estimates the probability of a certain event occurring. In the following equation, log of odds changes linearly as a function of explanatory variables:

One can simply ask, why odds, log(odds) and not probability? In fact, this is interviewers favorite question in analytics interviews.

The reason is as follows:

By converting probability to log(odds), we have expanded the range from [0, 1] to [- ∞, +∞ ]. By fitting model on probability we will encounter a restricted range problem, and also by applying log transformation, we cover-up the non-linearity involved and we can just fit with a linear combination of variables.

One more question one ask is what will happen if someone fit the linear regression on a 0-1 problem rather than on logistic regression?

A brief explanation is provided with the following image:

  • Error terms will tend to be large at the middle values of X (independent variable) and small at the extreme values, which is the violation of linear regression assumptions that errors should have zero mean and should be normally distributed
  • Generates nonsensical predictions of greater than 1 and less than 0 at end values of X
  • The ordinary least squares (OLS) estimates are inefficient and standard errors are biased
  • High error variance in the middle values of X and low variance at ends

All the preceding issues are solved by using logistic regression.

主站蜘蛛池模板: 汾西县| 吉林市| 河南省| 平阴县| 南投县| 铜鼓县| 金山区| 蕉岭县| 屏东县| 灵璧县| 永康市| 淮阳县| 博乐市| 常熟市| 丹江口市| 五原县| 林甸县| 乳山市| 阿合奇县| 盐池县| 陇川县| 沙田区| 理塘县| 岢岚县| 潼关县| 印江| 青龙| 广东省| 石屏县| 元谋县| 宝应县| 云龙县| 临泉县| 正安县| 姜堰市| 广东省| 错那县| 海兴县| 泌阳县| 施秉县| 台山市|