官术网_书友最值得收藏!

Classification and logistic regression

In the previous section, we learned how to predict continuous quantities (for example, the impact of TV advertising on company sales) as linear functions of input values (for example, TV, Radio, and newspaper advertisements). But for other tasks, the output will not be continuous quantities. For example, predicting whether someone is diseased or not is a classification problem and we need a different learning algorithm to perform this. In this section, we are going to dig deeper into the mathematical analysis of logistic regression, which is a learning algorithm for classification tasks.

In linear regression, we tried to predict the value of the output variable y(i) for the ith sample x(i) in that dataset using a linear model function y = hθ(x)=θΤ x. This is not really a great solution for classification tasks such as predicting binary labels (y(i) ∈ {0,1}).

Logistic regression is one of the many learning algorithms that we can use for classification tasks, whereby we use a different hypothesis class while trying to predict the probability that a specific sample belongs to the one class and the probability that it belongs to the zero class. So, in logistic regression, we will try to learn the following functions:

The function is often called a sigmoid or logistic function, which squashes the value of θΤx into a fixed range [0,1], as shown in the following graph. Because the value will be squashed between [0,1], we can then interpret hθ(x) as a probability.

Our goal is to search for a value of the parameters θ so that the probability P(y = 1|x) = hθ(x)) is large when the input sample x belongs to the one class and small when x belongs to the zero class:

Figure 6: Shape of the sigmoid function

So, suppose we have a set of training samples with their corresponding binary labels {(x(i),y(i)): i = 1,...,m}. We will need to minimize the following cost function, which measures how good a given hθ does:

Note that we have only one of the two terms of the equation's summation as non-zero for each training sample (depending on whether the value of the label y(i) is 0 or ). When y(i) = 1, minimizing the model cost function means we need to make hθ(x(i)) large, and when y(i) = 0, we want to make 1-hθ large.

Now, we have a cost function that calculates how well a given hypothesis hθ fits our training samples. We can learn to classify our training samples by using an optimization technique to minimize J(θ) and find the best choice of parameters θ. Once we have done this, we can use these parameters to classify a new test sample as 1 or 0, checking which of these two class labels is most probable. If P(y = 1|x) < P(y = 0|x) then we output 0, otherwise we output 1, which is the same as defining a threshold of 0.5 between our classes and checking whether hθ(x) > 0.5.

To minimize the cost function J(θ), we can use an optimization technique that finds the best value of θ that minimizes the cost function. So, we can use a calculus tool called gradient, which tries to find the greatest rate of increase of the cost function. Then, we can take the opposite direction to find the minimum value of this function; for example, the gradient of J(θ) is denoted by ?θJ(θ), which means taking the gradient for the cost function with respect to the model parameters. Thus, we need to provide a function that computes J(θ) and ?θJ(θ) for any requested choice of θ. If we derived the gradient or derivative of the cost function above J(θ) with respect to θjwe will get the following results:

Which can be written in a vector form as:

Now, we have a mathematical understanding of the logistic regression, so let's go ahead and use this new learning method for solving a classification task.

主站蜘蛛池模板: 隆安县| 沛县| 米易县| 宁明县| 莲花县| 巴彦淖尔市| 盐池县| 五台县| 龙陵县| 大厂| 山阴县| 延川县| 绩溪县| 绵阳市| 皋兰县| 玛多县| 泰顺县| 沂源县| 会宁县| 临潭县| 吕梁市| 南昌市| 凉山| 河池市| 贵南县| 茂名市| 东乡县| 东莞市| 西峡县| 闵行区| 宜昌市| 连平县| 遵化市| 惠安县| 泗阳县| 法库县| 天水市| 陇南市| 黄冈市| 文登市| 方正县|