書名： Python Deep Learning
作者名： Ivan Vasilev Daniel Slater Gianmario Spacagna Peter Roelants Valentino Zocca
本章字數： 511字
更新時間： 2021-07-02 14:31:00

Support vector machines

A support vector machine (SVM) is a supervised machine learning algorithm that is mainly used for classification. It is the most popular member of the kernel method class of algorithms. An SVM tries to find a hyperplane, which separates the samples in the dataset.

A hyperplane is a plane in a high-dimensional space. For example, a hyperplane in a one-dimensional space is a point, and in a two-dimensional space, it would just be a line. We can think of classification as a process of trying to find a hyperplane that will separate different groups of data points. Once we have defined our features, every sample (in our case, an email) in the dataset can be thought of as a point in the multidimensional space of features. One dimension of that space represents all the possible values of one feature. The coordinates of a point (a sample) are the specific values of each feature for that sample. The ML algorithm task will be to draw a hyperplane to separate points with different classes. In our case, the hyperplane would separate spam from non-spam emails.

In the following diagram, on the top and bottom, you can see two classes of points (red and blue) that are in a two-dimensional feature space (the x and y axes). If both the x and y values of a point are below five, then the point is blue. In all other cases, the point is red. In this case, the classes are linearly-separable, meaning we can separate them with a hyperplane. Conversely, the classes in the image at the bottom are linearly-inseparable:

The SVM tries to find a hyperplane that maximizes the distance between itself and the points. In other words, from all possible hyperplanes that can separate the samples, the SVM finds the one that has the maximum distance from all points. In addition, SVMs can also deal with data that is not linearly-separable. There are two methods for this: introducing soft margins or using the kernel trick.

Soft margins work by allowing a few misclassified elements while retaining the most predictive ability of the algorithm. In practice, it's better not to overfit the machine learning model, and we could do so by relaxing some of the support-vector-machine hypotheses.

The kernel trick solves the same problem in a different way. Imagine that we have a two-dimensional feature space, but the classes are linearly-inseparable. The kernel trick uses a kernel function that transforms the data by adding more dimensions to it. In our case, after the transformation, the data will be three-dimensional. The linearly-inseparable classes in the two-dimensional space will become linearly-separable in the three dimensions and our problem is solved:

In the graph on the left image, we can see a non-linearly-separable set before the kernel was applied and on the bottom. On the right, we can see the same dataset after the kernel has been applied, and the data can be linearly separated

官术网_书友最值得收藏!

Python Deep Learning

Support vector machines