官术网_书友最值得收藏!

Analyzing the effect of activation functions on the feedforward networks accuracy

In the previous example, we used RELU as the activation function. TensorFlow supports multiple activation functions. Let's look at how each of these activation functions affects validation accuracy. We will generate some random values:

x_val = np.linspace(start=-10., stop=10., num=1000)

Then generate the activation output:

 # ReLU activation
y_relu = session.run(tf.nn.relu(x_val))
# ReLU-6 activation
y_relu6 = session.run(tf.nn.relu6(x_val))
# Sigmoid activation
y_sigmoid = session.run(tf.nn.sigmoid(x_val))
# Hyper Tangent activation
y_tanh = session.run(tf.nn.tanh(x_val))
# Softsign activation
y_softsign = session.run(tf.nn.softsign(x_val))

# Softplus activation
y_softplus = session.run(tf.nn.softplus(x_val))
# Exponential linear activation
y_elu = session.run(tf.nn.elu(x_val))

Plot the activation against x_val:

plt.plot(x_val, y_softplus, 'r--', label='Softplus', linewidth=2)
plt.plot(x_val, y_relu, 'b:', label='RELU', linewidth=2)
plt.plot(x_val, y_relu6, 'g-.', label='RELU6', linewidth=2)
plt.plot(x_val, y_elu, 'k-', label='ELU', linewidth=1)
plt.ylim([-1.5,7])
plt.legend(loc='top left')
plt.title('Activation functions', y=1.05)
plt.show()
plt.plot(x_val, y_sigmoid, 'r--', label='Sigmoid', linewidth=2)
plt.plot(x_val, y_tanh, 'b:', label='tanh', linewidth=2)
plt.plot(x_val, y_softsign, 'g-.', label='Softsign', linewidth=2)
plt.ylim([-1.5,1.5])
plt.legend(loc='top left')
plt.title('Activation functions with Vanishing Gradient', y=1.05)
plt.show()

Plots are shown in the following screenshot:

The plot comparing Activation functions with Vanishing Gradient is as follows:

Now let's look at the activation function and how it affects validation accuracy for NotMNIST data.

We have modified the previous example so that we can pass the activation function as a parameter in main():

RELU = 'RELU'
RELU6 = 'RELU6'
CRELU = 'CRELU'
SIGMOID = 'SIGMOID'
ELU = 'ELU'
SOFTPLUS = 'SOFTPLUS'
def activation(name, features):
if name == RELU:
return tf.nn.relu(features)
if name == RELU6:
return tf.nn.relu6(features)
if name == SIGMOID:
return tf.nn.sigmoid(features)
if name == CRELU:
return tf.nn.crelu(features)
if name == ELU:
return tf.nn.elu(features)
if name == SOFTPLUS:
return tf.nn.softplus(features)

The run() function definition encompasses the login that we defined earlier:

batch_size = 128
#activations = [RELU, RELU6, SIGMOID, CRELU, ELU, SOFTPLUS]
activations = [RELU, RELU6, SIGMOID, ELU, SOFTPLUS]
plot_loss = False
def run(name):
print(name)
with open(pickle_file, 'rb') as f:
save = pickle.load(f)
training_dataset = save['train_dataset']
training_labels = save['train_labels']
validation_dataset = save['valid_dataset']
validation_labels = save['valid_labels']
test_dataset = save['test_dataset']
test_labels = save['test_labels']
train_dataset, train_labels = reformat(training_dataset, training_labels)
valid_dataset, valid_labels = reformat(validation_dataset,
validation_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)

graph = tf.Graph()
no_of_neurons = 1024
with graph.as_default():

tf_train_dataset = tf.placeholder(tf.float32,
shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size,
num_of_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
# Define Variables.
# Training computation...
# Optimizer ..
# Predictions for the training, validation, and test data.
train_prediction = tf.nn.softmax(logits)
valid_prediction = tf.nn.softmax(
tf.matmul(activation(name,tf.matmul(tf_valid_dataset, w1) + b1), w2) + b2)
test_prediction = tf.nn.softmax(
tf.matmul(activation(name,tf.matmul(tf_test_dataset, w1) + b1), w2) + b2)

num_steps = 101
minibatch_acc = []
validation_acc = []
loss_array = []
with tf.Session(graph=graph) as session:
tf.initialize_all_variables().run()
print("Initialized")
for step in xrange(num_steps):
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
# Generate a minibatch.
batch_data = train_dataset[offset:(offset + batch_size), :]
batch_labels = train_labels[offset:(offset + batch_size), :]
feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels}

_, l, predictions = session.run(
[optimizer, loss, train_prediction], feed_dict=feed_dict)
minibatch_accuracy = accuracy(predictions, batch_labels)
validation_accuracy = accuracy(
valid_prediction.eval(), valid_labels)
if (step % 10 == 0):
print("Minibatch loss at step", step, ":", l)
print("Minibatch accuracy: %.1f%%" % accuracy(predictions,
batch_labels))
print("Validation accuracy: %.1f%%" % accuracy(
valid_prediction.eval(), valid_labels))
minibatch_acc.append(minibatch_accuracy)
validation_acc.append(validation_accuracy)
loss_array.append(l)
print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(),
test_labels))
return validation_acc, loss_array

Plots from the preceding list are shown in the following screenshot:

Validation accuracy for various activation functions

As can be seen in the preceding graphs, RELU and RELU6 provide maximum validation accuracy, which is close to 60 percent. Now let's look at how training loss behaves as we progress through the steps for various activations:

Training loss for various activations as a function of steps

Training loss converges to zero for most of the activation functions, though RELU is the least effective in the short-term.

主站蜘蛛池模板: 运城市| 邹城市| 江山市| 丹巴县| 临沂市| 苏尼特右旗| 新绛县| 南乐县| 会宁县| 信阳市| 通渭县| 鄂尔多斯市| 巴楚县| 奉化市| 水城县| 车险| 揭阳市| 呼图壁县| 灌云县| 台前县| 日照市| 板桥市| 汉阴县| 布尔津县| 玛沁县| 类乌齐县| 台北县| 合阳县| 柳州市| 夏邑县| 建德市| 页游| 师宗县| 新乡市| 广水市| 苗栗市| 波密县| 菏泽市| 古交市| 宣汉县| 广丰县|