書名： Machine Learning for OpenCV
作者名： Michael Beyeler
本章字數： 572字
更新時間： 2021-07-02 19:47:23

Understanding linear regression

The easiest regression model is called linear regression. The idea behind linear regression is to describe a target variable (such as Boston house pricing) with a linear combination of features.

To keep things simple, let's just focus on two features. Let's say we want to predict tomorrow's stock prices using two features: today's stock price and yesterday's stock price. We will denote today's stock price as the first feature f₁, and yesterday's stock price as f₂. Then the goal of linear regression would be to learn two weight coefficients, w₁ and w_2, so that we can predict tomorrow's stock price as follows:

Here, ? is the prediction of tomorrow's ground truth stock price y.

The special case of having only one feature variable is called simple linear regression.

We could easily extend this to feature more stock price samples from the past. If we had M feature values instead of two, we would extend the preceding equation to a sum of M products, so that every feature gets accompanied by a weight coefficient. We can write the resulting equation as follows:

Let's think about this equation geometrically for a second. In the case of a single feature, f₁, the equation for ? would become ? = w₁ f₁, which is essentially a straight line. In the case of two features, ? = w₁ f₁ + w₂ f₂ would describe a plane in the feature space, as illustrated in the following figure:

Prediciting target values in two and three dimensions using linear regression

In N dimensions, would become what is known as a hyperplane. If a space is N-dimensional, then its hyperplanes have N-1 dimensions.

As is evident in the preceding figure, all of these lines and planes intersect at the origin. But, what if the true y values we are trying to approximate don't go through the origin?

In order to be able to offset ? from the origin, it is customary to add an additional weight coefficient that does not depend on any feature values, and thus acts like a bias term. In a 1D case, this term acts as the ?-intercept. In practice, this is often achieved by setting f₀=1 so that w₀ can act as the bias term:

Finally, the goal of linear regression is to learn a set of weight coefficients that lead to a prediction that approximates the ground truth values as accurately as possible. Rather than explicitly capturing a model's accuracy like we did with classifiers, scoring functions in regression often take the form of so called cost functions (or loss functions).

As discussed earlier in this chapter, there are a number of scoring functions we can use to measure the performance of a regression model. The most commonly used cost function is probably the mean squared error, which calculates an error (y_i - ?_i)² for every data point i by comparing the prediction ?_i to the target output value y_i and then taking the average.

Then regression becomes an optimization problem--and the task is to find the set of weights that minimizes the cost function.

This is usually done with an iterative algorithm that is applied to one data point after the other, thus reducing the cost function step by step. We will talk more deeply about such algorithms in Chapter 9, Using Deep Learning to Classify Handwritten Digits.

But enough with all this theory--let's do some coding!

官术网_书友最值得收藏!

Machine Learning for OpenCV

Understanding linear regression