官术网_书友最值得收藏!

Supervised learning

A supervised scenario is characterized by the concept of a teacher or supervisor, whose main task is to provide the agent with a precise measure of its error (directly comparable with output values). With actual algorithms, this function is provided by a training set made up of couples (input and expected output). Starting from this information, the agent can correct its parameters so as to reduce the magnitude of a global loss function. After each iteration, if the algorithm is flexible enough and data elements are coherent, the overall accuracy increases and the difference between the predicted and expected value becomes close to zero. Of course, in a supervised scenario, the goal is training a system that must also work with samples never seen before. So, it's necessary to allow the model to develop a generalization ability and avoid a common problem called overfitting, which causes an overlearning due to an excessive capacity (we're going to discuss this in more detail in the next chapters, however we can say that one of the main effects of such a problem is the ability to predict correctly only the samples used for training, while the error for the remaining ones is always very high).

In the following figure, a few training points are marked with circles and the thin blue line represents a perfect generalization (in this case, the connection is a simple segment):

Two different models are trained with the same datasets (corresponding to the two larger lines). The former is unacceptable because it cannot generalize and capture the fastest dynamics (in terms of frequency), while the latter seems a very good compromise between the original trend and a residual ability to generalize correctly in a predictive analysis.

Formally, the previous example is called regression because it's based on continuous output values. Instead, if there is only a discrete number of possible outcomes (called categories), the process becomes a classification. Sometimes, instead of predicting the actual category, it's better to determine its probability distribution. For example, an algorithm can be trained to recognize a handwritten alphabetical letter, so its output is categorical (in English, there'll be 26 allowed symbols). On the other hand, even for human beings, such a process can lead to more than one probable outcome when the visual representation of a letter isn't clear enough to belong to a single category. That means that the actual output is better described by a discrete probability distribution (for example, with 26 continuous values normalized so that they always sum up to 1). 

In the following figure, there's an example of classification of elements with two features. The majority of algorithms try to find the best separating hyperplane (in this case, it's a linear problem) by imposing different conditions. However, the goal is always the same: reducing the number of misclassifications and increasing the noise-robustness. For example, look at the triangular point that is closer to the plane (its coordinates are about [5.1 - 3.0]). If the magnitude of the second feature were affected by noise and so the value were quite smaller than 3.0, a slightly higher hyperplane could wrongly classify it. We're going to discuss some powerful techniques to solve these problems in later chapters.

Common supervised learning applications include:

  • Predictive analysis based on regression or categorical classification
  • Spam detection
  • Pattern detection
  • Natural Language Processing
  • Sentiment analysis
  • Automatic image classification
  • Automatic sequence processing (for example, music or speech)
主站蜘蛛池模板: 贡嘎县| 肥乡县| 锡林郭勒盟| 北京市| 凯里市| 黄冈市| 浦江县| 岳普湖县| 汝州市| 马龙县| 乳源| 延庆县| 益阳市| 巴林右旗| 屯昌县| 阿拉善盟| 确山县| 威宁| 西峡县| 西丰县| 孝昌县| 合山市| 饶阳县| 霍邱县| 齐河县| 平和县| 龙川县| 遂溪县| 南丹县| 弥勒县| 龙陵县| 井研县| 德保县| 凤山县| 尉氏县| 内乡县| 沽源县| 延庆县| 乐业县| 龙泉市| 兴文县|