官术网_书友最值得收藏!

Logistic regression as a neural network

Logistic regression is a classifier algorithm. Here, we try to predict the probability of the output classes. The class with the highest probability becomes the predicted output. The error between the actual and predicted output is calculated using cross-entropy and minimized through backpropagation. Check the following diagram for binary logistic regression and multi-class logistic regression. The difference is based on the problem statement. If the unique number of output classes is two then it's called binary classification, if it's more than two then it's called multi-class classification. If there are no hidden layers, we use the sigmoid function for the binary classification and we get the architecture for binary logistic regression. Similarly, if there are no hidden layers and we use use the softmax function for the multi-class classification, we get the architecture for multi-class logistic regression.

Now a question arises, why not use the sigmoid function for multi-class logistic regression ?

The answer, which is true for all predicted output layers of any neural network, is that the predicted outputs should follow a probability distribution. In normal terms, say the output has N classes. This will result in N probabilities for an input data having, say, d dimensions. Thus, the sum of the N probabilities for this one input data should be 1 and each of those probabilities should be between 0 and 1 inclusive.

On the one hand, the summation of the sigmoid function for N different classes may not be 1 in the majority of cases. Therefore, in case of binary, the sigmoid function is applied to obtain the probability of one class, that is, p(y = 1|x), and for the other class the probability, that is, p(y = 0|x) = 1 ? p(y = 1|x). On the other hand, the output of a softmax function is values satisfying the probability distribution properties. In the diagram, refers to the sigmoid function:

A follow-up question might also arise: what if we use softmax in binary logistic regression?

As mentioned previously, as long as your predicted output follows the rules of probability distribution, everything is fine. Later, we will discuss cross entropy and the importance of probability distribution as a building block for any machine learning problem especially dealing with classification tasks.

A probability distribution is valid if the probabilities of all the values in the distribution are between 0 and 1, inclusive, and the sum of those probabilities must be 1.

Logistic regression can be viewed in a very small neural network. Let's try to go through a step-by-step process to implement a binary logistic regression, as shown here:

主站蜘蛛池模板: 巴塘县| 西丰县| 略阳县| 凤阳县| 榆中县| 封开县| 连江县| 房山区| 织金县| 民权县| 富锦市| 津市市| 汕尾市| 大同县| 嘉鱼县| 静宁县| 横峰县| 南涧| 阿勒泰市| 滕州市| 天峨县| 宜川县| 维西| 临夏县| 新巴尔虎右旗| 咸阳市| 和田市| 福州市| 衡山县| 烟台市| 伊春市| 崇信县| 阳曲县| 桂平市| 扶沟县| 凤庆县| 来宾市| 罗田县| 丹阳市| 高淳县| 历史|