官术网_书友最值得收藏!

Algorithms in machine learning

So far we have developed an abstract understanding of machine learning. We understand the definition of machine learning which states that a task T can be learned by a computer program utilizing data in the form of experience E when its performance P improves with it. We have also seen how machine learning is different from conventional programming paradigms because of the fact that we do not code each and every step, rather we let the program form an understanding of the problem space and help us solve it. It is rather surprising to see such a program work right in front of us.

All along while we learned about the concept of machine learning, we treated this magical computer program as a mysterious black box which learns and solves the problems for us. Now is the time we unravel its enigma and look under the hood and see these magical algorithms in full glory.

We will begin with some of the most commonly and widely used algorithms in machine learning, looking at their intricacies, usage, and a bit of mathematics wherever necessary. Through this chapter, you will be introduced to different families of algorithms. The list is by no means exhaustive and, even though the algorithms will be explained in fair detail, a deep theoretical understanding of each of them is beyond the scope of this book. There is tons of material easily available in the form of books, online courses, blogs, and more.

Perceptron

This is like the Hello World algorithm of the machine learning universe. It may be one of the easiest of the lot to understand and use but it is by no means any less powerful.

Published in 1958 by Frank Rosenblatt, the perceptron algorithm gained much attention because of its guarantee to find a separator in a separable data set.

A perceptron is a function (or a simplified neuron to be precise) which takes a vector of real numbers as input and generates a real number as output.

Mathematically, a perceptron can be represented as:

Where, w1,…,wn are weights, b is a constant termed as bias, x1,…,xn are inputs, and y is the output of the function f, which is called the activation function.

The algorithm is as follows:

  1. Initialize weight vector w and bias b to small random numbers.
  2. Calculate the output vector y based on the function f and vector x.
  3. Update the weight vector w and bias b to counter the error.
  4. Repeat steps 2 and 3 until there is no error or the error drops below a certain threshold.

The algorithm tries to find a separator which pides the input into two classes by using a labeled data set called the training data set (the training data set corresponds to the experience E as stated in the definition for machine learning in the previous section). The algorithm starts by assigning random weights to the weight vector w and the bias b. It then processes the input based on the function f and gives a vector y. This generated output is then compared with the correct output value from the training data set and the updates are made to w and b respectively. To understand the weight update process, let us consider a point, say p1, with a correct output value of +1. Now, suppose if the perceptron misclassifies p1 as -1, it updates the weight w and bias b to move the perceptron by a small amount (movement is restricted by learning rate to prevent sudden jumps) in the direction of p1 in order to correctly classify it. The algorithm stops when it finds the correct separator or when the error in classifying the inputs drops below a certain user defined threshold.

Now, let us see the algorithm in action with the help of small example.

For the algorithm to work, we need a linearly separable data set. Let us assume the data is generated by the following function:

Based on the preceding equation, the correct separator will be given as:

Generating an input vector x using uniformly distributed data in R is done as follows:

#30 random numbers between -1 and 1 which are uniformly distributed
x1 <- runif(30,-1,1) 
x2 <- runif(30,-1,1)
#form the input vector x
x <- cbind(x1,x2)

Now that we have the data, we need a function to classify it into one of the two categories.

#helper function to calculate distance from hyperplane
calculate_distance = function(x,w,b) {
 sum(x*w) + b
}

#linear classifier
linear_classifier = function(x,w,b) {
distances =apply(x, 1, calculate_distance, w, b)
return(ifelse(distances < 0, -1, +1))
}

The helper function, calculate_distance, calculates the distance of each point from the separator, while linear_classifier classifies each point either as belonging to class -1 or class +1.

The perceptron algorithm then uses the preceding classifier function to find the correct separator using the training data set.

#function to calculate 2nd norm
second_norm = function(x) {sqrt(sum(x * x))}

#perceptron training algorithm
perceptron = function(x, y, learning_rate=1) {

w = vector(length = ncol(x)) # initialize w
b = 0 # Initialize b
k = 0 # count iterations

#constant with value greater than distance of furthest point
R = max(apply(x, 1, second_norm)) 

incorrect = TRUE # flag to identify classifier

#initialize plot
plot(x,cex=0.2)

#loop till correct classifier is not found
while (incorrect ) {

 incorrect =FALSE 

 #classify with current weights
 yc <- linear_classifier(x,w,b)
 #Loop over each point in the input x
 for (i in 1:nrow(x)) {
 #update weights if point not classified correctly
 if (y[i] != yc[i]) {
 w <- w + learning_rate * y[i]*x[i,]
 b <- b + learning_rate * y[i]*R^2
 k <- k+1

 #currect classifier's components
 # update plot after ever 5 iterations
 if(k%%5 == 0){
 intercept <- - b / w[[2]]
 slope <- - w[[1]] / w[[2]]
 #plot the classifier hyper plane
 abline(intercept,slope,col="red")
 #wait for user input
 cat ("Iteration # ",k,"\n")
 cat ("Press [enter] to continue")
 line <- readline()
 }
 incorrect =TRUE
 }
 }
}

s = second_norm(w)
#scale the classifier with unit vector
return(list(w=w/s,b=b/s,updates=k))
}

It's now time to train the perceptron!

#train the perceptron
p <- perceptron(x,Y)

The plot will look as follows:

The perceptron working its way to find the correct classifier. The correct classifier is shown in green

The preceding plots show the perceptron's training state. Each incorrect classifier is shown with a red line. As shown, the perceptron ends after finding the correct classifier marked in green.

A zoomed-in view of the final separator can be seen as follows:

#classify based on calculated 
y <- linear_classifier(x,p$w,p$b)


plot(x,cex=0.2)

#zoom into points near the separator and color code them
#marking data points as + which have y=1 and – for others
points(subset(x,Y==1),col="black",pch="+",cex=2)
points(subset(x,Y==-1),col="red",pch="-",cex=2)

# compute intercept on y axis of separator
# from w and b
intercept <- - p$b / p$w[[2]]

# compute slope of separator from w
slope <- - p$w[[1]] /p$ w[[2]]

# draw separating boundary
abline(intercept,slope,col="green")

The plot looks as shown in the following figure:

The correct classifier function as found by perceptron

主站蜘蛛池模板: 开原市| 华安县| 南京市| 渝中区| 邹平县| 房产| 麻江县| 静安区| 巴林左旗| 马关县| 奉新县| 子长县| 临安市| 周口市| 兰州市| 白银市| 社会| 广灵县| 大连市| 济南市| 琼结县| 彭阳县| 伊金霍洛旗| 水城县| 弥渡县| 双柏县| 长垣县| 肇源县| 泰安市| 邵阳县| 赞皇县| 名山县| 巫溪县| 新乡县| 九江县| 平和县| 太仓市| 图木舒克市| 阿拉善右旗| 商洛市| 佛冈县|