官术网_书友最值得收藏!

SQL Server and machine learning

SQL Server has integrated machine learning with the version of 2016, when the first R services were introduced and with the 2017 version, when Python language support was added to SQL Server, and the feature was renamed machine-learning services. Machine-learning services can be used to solve complex problems and the tasks and algorithms are chosen based on the data and the expected prediction.

There are two main flavors of machine learning:

  • Supervised
  • Unsupervised

The main difference between the two is that supervised learning is performed by having a ground truth available. This means that the predictive model is building the capabilities based on the prior knowledge of what the output values for a sample should be. Unsupervised learning, on the other hand, does not have any sort of this outputs and the goal is to find natural structures present in the datasets. This can mean grouping datapoints into clusters or finding different ways of looking at complex data so that it appears simpler or more organized.

Each type of machine learning is using different algorithms to solve the problem. Typical supervised learning algorithms would include the following:

  • Classification
  • Regression

When the data is being used to predict a category, supervised learning is also called classification. This is the case when assigning an image as a picture of either a car or a boat, for example. When there are only two options, it's called two-class classification. When there are more categories, such as when predicting the winner of the World Cup, this problem is known as multi-class classification.

The major algorithms used here would include the following:

  • Logistic regression
  • Decision tree/forest/jungle
  • Support vector machine
  • Bayes point machine

Regression algorithms can be used for both types of variables, continuous and discrete, to predict a value, where for continuous values you'll apply Linear Regression and on discrete values, Logistic Regression. In linear regression, the relationship between the input variable (x) and output variable (y) is expressed as an equation of the form y = a + bx. Thus, the goal of linear regression is to find out the values of coefficients a and b and fit a line that is nearest to most of the points. Some good rules of thumb when using this technique are to remove variables that are very similar (correlated) and to remove noise from your data, if possible. It is a fast and simple technique and a good first algorithm to try. 

Logistic regression is best suited for binary classification (datasets where y = 0 or 1, where 1 denotes the default class. So, if you're predicting whether an event will occur, the instance of event occurring will be classified as 1. It measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function.

There are numerous kinds of regressions available, depending on the Python or R package that is used; for example, we could use any of the following:

  • Bayesian linear regression
  • Linear regression
  • Poisson regression
  • Decision forest regression
  • Many others

We'll focus on implementing the machine-learning algorithms and models in the following chapters with SQL Server Machine Learning Services to train and predict the values from the dataset.

主站蜘蛛池模板: 论坛| 新兴县| 色达县| 饶河县| 丹东市| 临安市| 古田县| 鹤山市| 阳东县| 鹿邑县| 巴林右旗| 吴江市| 西城区| 谢通门县| 邵武市| 阜南县| 宁河县| 维西| 湄潭县| 新乡市| 化隆| 确山县| 太湖县| 襄汾县| 资兴市| 洞口县| 磐安县| 南平市| 洛阳市| 黎川县| 崇礼县| 黄山市| 山丹县| 罗平县| 乌什县| 龙胜| 神农架林区| 北安市| 达州市| 宁陵县| 铁岭市|