官术网_书友最值得收藏!

  • Hands-On Neural Networks
  • Leonardo De Marchi Laura Mitchell
  • 347字
  • 2021-06-24 14:00:10

Supervised learning in practice with Python

As we said earlier, supervised learning algorithms learn to approximate a function by mapping inputs and outputs to create a model that is able to predict future outputs given unseen inputs.

It's conventional to denote inputs as x and outputs as y; both can be numerical or categorical.

We can distinguish them as two different types of supervised learning:

  • Classification
  • Regression

Classification is a task where the output variable can assume a finite amount of elements, called categories. An example of classification would be classifying different types of flowers (output) given the sepal length (input). Classification can be further categorized in more sub types:

  • Binary classification: The task of predicting whether an instance belongs either to one class or the other
  • Multiclass classification: The task (also known as multinomial) of predicting the most probable label (class) for each single instance
  • Multilabel classification: When multiple labels can be assigned to each input

Regression is a task where the output variable is continuous. Here are some common regression algorithms:

  • Linear regression: This finds linear relationships between inputs and outputs
  • Logistic regression: This finds the probability of a binary output

In general, the supervised learning problem is solved in a standard way by performing the following steps:

  1. Performing data cleaning to make sure the data we are using is as accurate and descriptive as possible.
  2. Executing the feature engineering process, which involves the creation of new features out of the existing ones for improving the algorithm's performance.
  3. Transforming input data into something that our algorithm can understand, which is known as data transformation. Some algorithms, such as neural networks, don't work well with data that is not scaled as they would naturally give more importance to inputs with a larger magnitude.
  4. Choosing an appropriate model (or a few of them) for the problem.
  5. Choosing an appropriate metric to measure the effectiveness of our algorithm.
  6. Train the model using a subset of the available data, called the training set. On this training set, we calibrate the data transformations.
  7. Testing the model.
主站蜘蛛池模板: 遵义市| 奉化市| 浦北县| 鄢陵县| 太仓市| 香格里拉县| 安仁县| 高雄县| 保康县| 阿图什市| 津市市| 阿图什市| 河北省| 枣阳市| 武宣县| 鄯善县| 海伦市| 梁山县| 肇庆市| 织金县| 镇安县| 阳新县| 潼南县| 阆中市| 赫章县| 孝感市| 辉南县| 海兴县| 汽车| 城步| 大竹县| 汾阳市| 和政县| 新昌县| 商洛市| 周宁县| 腾冲县| 牙克石市| 朝阳市| 临桂县| 曲阳县|