官术网_书友最值得收藏!

How to do it...

Now that we understand how learning rate influences the output values, let's see the impact of the learning rate in action on the MNIST dataset we saw earlier, where we keep the same model architecture but will only be changing the learning rate parameter.

Note that we will be using the same data-preprocessing steps as those of step 1 and step 2 in the Scaling input dataset recipe.

Once we have the dataset preprocessed, we vary the learning rate of the model by specifying the optimizer in the next step:

  1. We change the learning rate as follows:
from keras import optimizers
adam=optimizers.Adam(lr=0.01)

With the preceding code, we have initialized the Adam optimizer with a specified learning rate of 0.01.

  1. We build, compile, and fit the model as follows:
model = Sequential()
model.add(Dense(1000, input_dim=784, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])

history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=500, batch_size=1024, verbose=1)

The accuracy of the preceding network is ~90% at the end of 500 epochs. Let's have a look at how loss function and accuracy vary over a different number of epochs (the code to generate the plots in the following diagram remains the same as the code we used in step 8 of the Training a vanilla neural network recipe):

Note that when the learning rate was high (0.01 in the current scenario) compared to 0.0001 (in the scenario considered in the Scaling input dataset recipe), the loss decreased less smoothly when compared to the low-learning-rate model.

The low-learning-rate model updates the weights slowly, thereby resulting in a smoothly reducing loss function, as well as a high accuracy, which was achieved slowly over a higher number of epochs.

Alternatively, the step changes in loss values when the learning rate is higher are due to the loss values getting stuck in a local minima until the weight values change to optimal values. A lower learning rate gives a better possibility of arriving at the optimal weight values faster, as the weights are changed slowly, but steadily, in the right direction.

In a similar manner, let's explore the network accuracy when the learning rate is as high as 0.1:

from keras import optimizers
adam=optimizers.Adam(lr=0.1)

model = Sequential()
model.add(Dense(1000, input_dim=784, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=500, batch_size=1024, verbose=1)

It is to be noted that the loss values could not decrease much further, as the learning rate was high; that is, potentially the weights got stuck in a local minima:

Thus, it is, in general, a good idea to set the learning rate to a low value and let the network learn over a high number of epochs.

主站蜘蛛池模板: 阿克苏市| 宾阳县| 阿鲁科尔沁旗| 揭西县| 武功县| 孝昌县| 榕江县| 郎溪县| 息烽县| 钦州市| 南乐县| 台州市| 北京市| 唐河县| 韩城市| 科尔| 新沂市| 西吉县| 阳谷县| 华蓥市| 茂名市| 合江县| 桂林市| 龙南县| 荣成市| 娄烦县| 陵川县| 来安县| 肃北| 广西| 永川市| 清徐县| 南溪县| 贞丰县| 宁河县| 兰溪市| 宁南县| 永嘉县| 南岸区| 望江县| 大安市|