官术网_书友最值得收藏!

How to do it...

Now that we understand how learning rate influences the output values, let's see the impact of the learning rate in action on the MNIST dataset we saw earlier, where we keep the same model architecture but will only be changing the learning rate parameter.

Note that we will be using the same data-preprocessing steps as those of step 1 and step 2 in the Scaling input dataset recipe.

Once we have the dataset preprocessed, we vary the learning rate of the model by specifying the optimizer in the next step:

  1. We change the learning rate as follows:
from keras import optimizers
adam=optimizers.Adam(lr=0.01)

With the preceding code, we have initialized the Adam optimizer with a specified learning rate of 0.01.

  1. We build, compile, and fit the model as follows:
model = Sequential()
model.add(Dense(1000, input_dim=784, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])

history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=500, batch_size=1024, verbose=1)

The accuracy of the preceding network is ~90% at the end of 500 epochs. Let's have a look at how loss function and accuracy vary over a different number of epochs (the code to generate the plots in the following diagram remains the same as the code we used in step 8 of the Training a vanilla neural network recipe):

Note that when the learning rate was high (0.01 in the current scenario) compared to 0.0001 (in the scenario considered in the Scaling input dataset recipe), the loss decreased less smoothly when compared to the low-learning-rate model.

The low-learning-rate model updates the weights slowly, thereby resulting in a smoothly reducing loss function, as well as a high accuracy, which was achieved slowly over a higher number of epochs.

Alternatively, the step changes in loss values when the learning rate is higher are due to the loss values getting stuck in a local minima until the weight values change to optimal values. A lower learning rate gives a better possibility of arriving at the optimal weight values faster, as the weights are changed slowly, but steadily, in the right direction.

In a similar manner, let's explore the network accuracy when the learning rate is as high as 0.1:

from keras import optimizers
adam=optimizers.Adam(lr=0.1)

model = Sequential()
model.add(Dense(1000, input_dim=784, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=500, batch_size=1024, verbose=1)

It is to be noted that the loss values could not decrease much further, as the learning rate was high; that is, potentially the weights got stuck in a local minima:

Thus, it is, in general, a good idea to set the learning rate to a low value and let the network learn over a high number of epochs.

主站蜘蛛池模板: 西藏| 青川县| 崇阳县| 永平县| 石渠县| 肥乡县| 宣武区| 青浦区| 湾仔区| 芒康县| 绥化市| 交城县| 庄浪县| 双牌县| 唐山市| 息烽县| 怀来县| 县级市| 固始县| 阿拉善右旗| 香港| 鱼台县| 饶阳县| 佛山市| 综艺| 桂林市| 八宿县| 浪卡子县| 成武县| 电白县| 筠连县| 锡林郭勒盟| 正阳县| 嵩明县| 镇沅| 邯郸县| 稻城县| 昌宁县| 美姑县| 肇庆市| 通州市|