官术网_书友最值得收藏!

  • Deep Learning with Theano
  • Christopher Bourez
  • 430字
  • 2021-07-15 17:17:00

Backpropagation and stochastic gradient descent

Backpropagation, or the backward propagation of errors, is the most commonly used supervised learning algorithm for adapting the connection weights.

Considering the error or the cost as a function of the weights W and b, a local minimum of the cost function can be approached with a gradient descent, which consists of changing weights along the negative error gradient:

Here,

is the learning rate, a positive constant defining the speed of a descent.

The following compiled function updates the variables after each feedforward run:

g_W = T.grad(cost=cost, wrt=W)
g_b = T.grad(cost=cost, wrt=b)

learning_rate=0.13
index = T.lscalar()

train_model = theano.function(
    inputs=[index],
    outputs=[cost,error],
    updates=[(W, W - learning_rate * g_W),(b, b - learning_rate * g_b)],
    givens={
        x: train_set_x[index * batch_size: (index + 1) * batch_size],
        y: train_set_y[index * batch_size: (index + 1) * batch_size]
    }
)

The input variable is the index of the batch, since all the dataset has been transferred in one pass to the GPU in shared variables.

Training consists of presenting each sample to the model iteratively (iterations) and repeating the operation many times (epochs):

n_epochs = 1000
print_every = 1000

n_train_batches = train_set[0].shape[0] // batch_size
n_iters = n_epochs * n_train_batches
train_loss = np.zeros(n_iters)
train_error = npzeros(n_iters)

for epoch in range(n_epochs):
    for minibatch_index in range(n_train_batches):
        iteration = minibatch_index + n_train_batches * epoch
        train_loss[iteration], train_error[iteration] = train_model(minibatch_index)
        if (epoch * train_set[0].shape[0] + minibatch_index) % print_every == 0 :
            print('epoch {}, minibatch {}/{}, training error {:02.2f} %, training loss {}'.format(
                epoch,
                minibatch_index + 1,
                n_train_batches,
                train_error[iteration] * 100,
                train_loss[iteration]
            ))

This only reports the loss and error on one mini-batch, though. It would be good to also report the average over the whole dataset.

The error rate drops very quickly during the first iterations, then slows down.

Execution time on a GPU GeForce GTX 980M laptop is 67.3 seconds, while on an Intel i7 CPU, it is 3 minutes and 7 seconds.

After a long while, the model converges to a 5.3 - 5.5% error rate, and with a few more iterations could go further down, but could also lead to overfitting, Overfitting occurs when the model fits the training data well but does not get the same error rate on unseen data.

In this case, the model is too simple to overfit on this data.

A model that is too simple cannot learn very well. The principle of deep learning is to add more layers, that is, increase the depth and build deeper networks to gain better accuracy.

We'll see in the following section how to compute a better estimation of the model accuracy and the training stop.

主站蜘蛛池模板: 五台县| 彰武县| 寿宁县| 互助| 柯坪县| 伊通| 阳原县| 禄劝| 海原县| 阜南县| 旺苍县| 临城县| 喀什市| 荥经县| 色达县| 东阳市| 宁海县| 托里县| 彰武县| 清水河县| 鄂托克旗| 望谟县| 寻乌县| 密山市| 河北省| 无棣县| 镇沅| 呈贡县| 达拉特旗| 金秀| 昆明市| 集安市| 罗定市| 兴义市| 巴东县| 共和县| 铜川市| 淳安县| 板桥市| 桂平市| 寿光市|