- Effective Amazon Machine Learning
- Alexis Perrier
- 399字
- 2021-07-03 00:17:48
Expanding regression to classification with logistic regression
Amazon ML uses a linear regression model for regression, binary, and multiclass predictions. Using the logistic regression model extends continuous regression to classification problems.
A simple regression model with one predictor is modeled as follows:

Here, x is the predictor, y is the outcome, and (a, b) are the model parameters. Each predicted value y is continuous and not bounded. How can we use that model to predict classes which are by definition categorical values?
Take the example of binary predictions. The method is to transform the continuous predictions that are not bounded into probabilities, which are all between 0 and 1. We then associate these probabilities to one of the two classes using a predefined threshold. This model is called the logistic regression model–misleading name as logistic regression is a classification model and not a regression one.
To transform continuous not bounded values into probabilities, we use the sigmoid function defined as follows:

This function transforms any real number into a value within the [0,1] interval. Its output can, therefore, be interpreted as a probability:

In conclusion, the way to do binary classification with a regression model is as follows:
- Build the regression model, and estimate the real valued outcomes y.
- Use the predicted value y as the argument of the sigmoid function. The result f(y) is a probability measure of belonging to one of the two classes.
- Set a threshold T in [0,1]. All predicted samples with a probability f(y) > T belong to one class, others belong to the other class. The default value for T = 0.5.
Logistic regression is, by nature, a Binary classifier. There are several strategies to transform a binary classifier into a multi class classifier.
The one versus all (OvA) technique consists in selecting one class as positive and all the others as negative to go back to a binary classification problem. Once the classification on the first class is carried out, a second class is selected as the positive versus all the others as negative. This process is repeated N-1 times when there are N classes to predict. The following set of plots shows:
- The original datasets and the classes for all the samples
- The result of the first Binary classification (circles versus all the others)
- The result of the second classification that separates the squares and the triangles

- Access 2016數據庫教程(微課版·第2版)
- ETL數據整合與處理(Kettle)
- 正則表達式必知必會
- 使用GitOps實現Kubernetes的持續部署:模式、流程及工具
- Visual Studio 2015 Cookbook(Second Edition)
- SQL Server 2008數據庫應用技術(第二版)
- Live Longer with AI
- Sybase數據庫在UNIX、Windows上的實施和管理
- 數據庫技術及應用教程
- Hadoop 3實戰指南
- Access數據庫開發從入門到精通
- PostgreSQL高可用實戰
- Microsoft Dynamics NAV 2015 Professional Reporting
- Flume日志收集與MapReduce模式
- 大數據用戶行為畫像分析實操指南