- Deep Learning with Theano
- Christopher Bourez
- 430字
- 2021-07-15 17:17:00
Backpropagation and stochastic gradient descent
Backpropagation, or the backward propagation of errors, is the most commonly used supervised learning algorithm for adapting the connection weights.
Considering the error or the cost as a function of the weights W and b, a local minimum of the cost function can be approached with a gradient descent, which consists of changing weights along the negative error gradient:

Here,

is the learning rate, a positive constant defining the speed of a descent.
The following compiled function updates the variables after each feedforward run:
g_W = T.grad(cost=cost, wrt=W) g_b = T.grad(cost=cost, wrt=b) learning_rate=0.13 index = T.lscalar() train_model = theano.function( inputs=[index], outputs=[cost,error], updates=[(W, W - learning_rate * g_W),(b, b - learning_rate * g_b)], givens={ x: train_set_x[index * batch_size: (index + 1) * batch_size], y: train_set_y[index * batch_size: (index + 1) * batch_size] } )
The input variable is the index of the batch, since all the dataset has been transferred in one pass to the GPU in shared variables.
Training consists of presenting each sample to the model iteratively (iterations) and repeating the operation many times (epochs):
n_epochs = 1000 print_every = 1000 n_train_batches = train_set[0].shape[0] // batch_size n_iters = n_epochs * n_train_batches train_loss = np.zeros(n_iters) train_error = npzeros(n_iters) for epoch in range(n_epochs): for minibatch_index in range(n_train_batches): iteration = minibatch_index + n_train_batches * epoch train_loss[iteration], train_error[iteration] = train_model(minibatch_index) if (epoch * train_set[0].shape[0] + minibatch_index) % print_every == 0 : print('epoch {}, minibatch {}/{}, training error {:02.2f} %, training loss {}'.format( epoch, minibatch_index + 1, n_train_batches, train_error[iteration] * 100, train_loss[iteration] ))
This only reports the loss and error on one mini-batch, though. It would be good to also report the average over the whole dataset.
The error rate drops very quickly during the first iterations, then slows down.
Execution time on a GPU GeForce GTX 980M laptop is 67.3 seconds, while on an Intel i7 CPU, it is 3 minutes and 7 seconds.
After a long while, the model converges to a 5.3 - 5.5% error rate, and with a few more iterations could go further down, but could also lead to overfitting, Overfitting occurs when the model fits the training data well but does not get the same error rate on unseen data.
In this case, the model is too simple to overfit on this data.
A model that is too simple cannot learn very well. The principle of deep learning is to add more layers, that is, increase the depth and build deeper networks to gain better accuracy.
We'll see in the following section how to compute a better estimation of the model accuracy and the training stop.
- C#高級編程(第10版) C# 6 & .NET Core 1.0 (.NET開發經典名著)
- Learn Type:Driven Development
- 自己動手實現Lua:虛擬機、編譯器和標準庫
- PHP基礎案例教程
- Instant Zepto.js
- Linux網絡程序設計:基于龍芯平臺
- 高級C/C++編譯技術(典藏版)
- Microsoft Dynamics GP 2013 Reporting, Second Edition
- Visual Basic程序設計與應用實踐教程
- BIM概論及Revit精講
- ExtJS高級程序設計
- HTML+CSS+JavaScript網頁設計從入門到精通 (清華社"視頻大講堂"大系·網絡開發視頻大講堂)
- C編程技巧:117個問題解決方案示例
- Selenium Essentials
- Lync Server Cookbook