Gradient for functions with respect to a real-valued matrix A is defined as the matrix of partial derivatives of A and is denoted as follows:
TensorFlow does not do numerical differentiation; rather, it supports automatic differentiation. By specifying operations in a TensorFlow graph, it can automatically run the chain rule through the graph and, as it knows the derivatives of each operation we specify, it can combine them automatically.
The following example shows training a network using MNIST data, the MNIST database consists of handwritten digits. It has a training set of 60,000 examples and a test set of 10,000 samples. The digits are size-normalized.
Here backpropagation is performed without any API usage and derivatives are calculated manually. We get 913 correct out of 1,000 tests. This concept will be introduced in the next chapter.
The following code snippet describes how to get the mnist dataset and initialize weights and biases:
import tensorflow as tf
# get mnist dataset from tensorflow.examples.tutorials.mnist import input_data data = input_data.read_data_sets("MNIST_data/", one_hot=True)
# x represents image with 784 values as columns (28*28), y represents output digit x = tf.placeholder(tf.float32, [None, 784]) y = tf.placeholder(tf.float32, [None, 10])
We now define a two-layered network with a nonlinear sigmoid function; a squared loss function is applied and optimized using a backward propagation algorithm, as shown in the following snippet:
# non-linear sigmoid function at each neuron def sigmoid(x): sigma = tf.div(tf.constant(1.0), tf.add(tf.constant(1.0), tf.exp(tf.negative(x)))) return sigma
# starting from first layer with wx+b, then apply sigmoid to add non-linearity z1 = tf.add(tf.matmul(x, w1), b1) a1 = sigmoid(z1) z2 = tf.add(tf.matmul(a1, w2), b2) a2 = sigmoid(z2)
# calculate the loss (delta) loss = tf.subtract(a2, y)
# derivative of the sigmoid function der(sigmoid)=sigmoid*(1-sigmoid) def sigmaprime(x): return tf.multiply(sigmoid(x), tf.subtract(tf.constant(1.0), sigmoid(x)))
for i in range(10000): batch_xs, batch_ys = data.train.next_batch(10) sess.run(step, feed_dict={x: batch_xs, y: batch_ys}) if i % 1000 == 0: res = sess.run(acct_res, feed_dict= {x: data.test.images[:1000], y: data.test.labels[:1000]}) print(res)
Now, let's use automatic differentiation with TensorFlow. The following example demonstrates the use of GradientDescentOptimizer. We get 924 correct out of 1,000 tests.
import tensorflowas tf
# get mnist dataset from tensorflow.examples.tutorials.mnist import input_data data = input_data.read_data_sets("MNIST_data/", one_hot=True)
# x represents image with 784 values as columns (28*28), y represents output digit x = tf.placeholder(tf.float32, [None, 784]) y = tf.placeholder(tf.float32, [None, 10])
# non-linear sigmoid function at each neuron def sigmoid(x): sigma = tf.div(tf.constant(1.0), tf.add(tf.constant(1.0), tf.exp(tf.negative(x)))) return sigma
# starting from first layer with wx+b, then apply sigmoid to add non-linearity z1 = tf.add(tf.matmul(x, w1), b1) a1 = sigmoid(z1) z2 = tf.add(tf.matmul(a1, w2), b2) a2 = sigmoid(z2)
# calculate the loss (delta) loss = tf.subtract(a2, y)
# derivative of the sigmoid function der(sigmoid)=sigmoid*(1-sigmoid) def sigmaprime(x): return tf.multiply(sigmoid(x), tf.subtract(tf.constant(1.0), sigmoid(x)))
for i in range(10000): batch_xs, batch_ys = data.train.next_batch(10) sess.run(step, feed_dict={x: batch_xs, y: batch_ys}) if i % 1000 == 0: res = sess.run(acct_res, feed_dict= {x: data.test.images[:1000], y: data.test.labels[:1000]}) print(res)