Analyzing Network Output - Part 1, Training and Saving

Posted on jeu. 12 avril 2018 in Tutorial by Corey Adams

MNIST Classification Trainer

This is a pretty simple notebook. We'll train a network to do mnist classification of digits with a convolutional network. The point here is not to teach too much about training a network, but to show how to properly save and restore a modern network with tensorflow, and analyze the output.

For a coherent set of tutorials that runs on your web-browser with Google's free GPU, check this out. We are compiling blog posts like this one into the tutorials when we find time!

Define the network

I'm going to use some tools copy/pasted from other projects for this. So some of this code might seem familar.

Meanwhile, let's import the needed stuff for training:

In [63]:
import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

For this network, I'm going to use a short non-residual network with batch normalization. I'm going to train the network using minibatching. Both of these techniques are overkill for this problem, but it's instructive to demonstrate the pitfalls of minibatching and batch normalization when saving a network.

Convolutional network

MNIST is 28x28, so we'll apply convolutions, downsampling, and finally global average pooling:

In [23]:
def convolutional_step(x, training):
    
    n_filters = 2*x.get_shape().as_list()[-1]
 

    x = tf.layers.conv2d(x, filters=n_filters,
                        kernel_size=[3,3],
                        strides=[1,1],
                        padding='same',
                        use_bias=False,
                        reuse=False,
                        trainable=training)
    

    # Here's an important gotcha: I set the decay to 0.9, and updates_collection to None
    # This forces the update to happen "in place" and makes sure the batch norm parameters
    # Get saved to files.
    x = tf.contrib.layers.batch_norm(x,
                                     updates_collections=None,
                                     decay=0.9,
                                     is_training=training,
                                     trainable=training,
                                     # name="BatchNorm",
                                     reuse=False)
    x = tf.nn.relu(x)
    return x

def build_network(x, training):
    
    print("Building network, initial shape: {0}".format(x.get_shape()))
        
    # Initial convolutions:
    x = convolutional_step(x, training)
    x = convolutional_step(x, training)
    
    # Downsample to 14x14:
    x = tf.layers.max_pooling2d(x,
                                 pool_size=2,
                                 strides=2,
                                 padding='valid')
    
    print("After first downsample shape: {0}".format(x.get_shape()))
    
    # More convolutions:
    x = convolutional_step(x, training)
    x = convolutional_step(x, training)
    
    # Downsample to 7x7:
    x = tf.layers.max_pooling2d(x,
                                 pool_size=2,
                                 strides=2,
                                 padding='valid')
    
    print("After first downsample shape: {0}".format(x.get_shape()))
    
    # More convolutions:
    x = convolutional_step(x, training)
    x = convolutional_step(x, training)
    
    # Downsample to 3x3:
    x = tf.layers.max_pooling2d(x,
                                 pool_size=2,
                                 strides=2,
                                 padding='valid')
    
    print("After last downsample shape: {0}".format(x.get_shape()))
    
    # Do a bottle neck step to merge into just 10 filters:
    
    x = tf.layers.conv2d(x,filters=10,
                        kernel_size=[1,1],
                        strides=[1,1],
                        padding='same',
                        use_bias=False,
                        reuse=False,
                        trainable=training
                        )
    
    
    # Do global average pooling to make 10 output logits:
    shape = (x.shape[1], x.shape[2])

    x = tf.nn.pool(x,
                   window_shape=shape,
                   pooling_type="AVG",
                   padding="VALID",
                   dilation_rate=None,
                   strides=None,
                   name="GlobalAveragePool",
                   data_format=None)

    # Reshape to remove empty dimensions:
    x = tf.reshape(x, [tf.shape(x)[0], 10],
                   name="global_pooling_reshape")

    
    return x

Set up the network:

In [47]:
tf.reset_default_graph()
# Define the input placeholders, as defined in the tensorflow mnist tutorial:
x  = tf.placeholder(tf.float32, shape=[None, 784], name='x')
y_ = tf.placeholder(tf.float32, shape=[None, 10], name='y_')

reshaped_x = tf.reshape(x, [-1, 28, 28, 1])

y = build_network(reshaped_x, training=True)
print "Final output shape: {0}".format(y.get_shape())
Building network, initial shape: (?, 28, 28, 1)
After first downsample shape: (?, 14, 14, 4)
After first downsample shape: (?, 7, 7, 16)
After last downsample shape: (?, 3, 3, 64)
Final output shape: (?, 10)

Loss and optimizers

Again, a lot of this is straight out of the tensorflow tutorial:

In [48]:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Optimizer, using minibatching:

In [49]:
# Global step:
global_step = tf.Variable(0, dtype=tf.int32, trainable=False, name='global_step')

# Accumulate trainable variables:
trainable_vars = [tf.Variable(tv.initialized_value(),trainable=False) for tv in tf.trainable_variables()]

opt  = tf.train.AdamOptimizer()
# Reset gradients:
zero_grads = [tv.assign(tf.zeros_like(tv)) for tv in trainable_vars]

# Accumulate gradients:
accum_gradients = [trainable_vars[i].assign_add(gv[0]) for i, gv in enumerate(opt.compute_gradients(cross_entropy))]

# Apply gradients:
apply_gradients = opt.apply_gradients(zip(trainable_vars, tf.trainable_variables()),
                    global_step = global_step)

Saving the network:

In [50]:
writer = tf.train.Saver()

Training

Not going to attempt anything crazy here, just train for 1000 iterations of 5 images per minibatch, 3 minibatches per iteration.

In [85]:
iterations      = 1000
minibatch_size  = 25
minibatch_count = 3

test_iteration = 5
snapshot_iteration = 50
train_losses = []
test_losses  = []
train_accs   = []
test_accs    = []
train_iters  = []
test_iters   = []

save_dir     = '/home/cadams/DeepLearnPhysics/mnist-train-and-analyze' + "/log_mnist_classifier/"
if not os.path.isdir(save_dir):
    os.mkdir(save_dir)
print save_dir
/home/cadams/DeepLearnPhysics/mnist-train-and-analyze/log_mnist_classifier/
In [86]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for _it in range(iterations + 1):
    
    # Zero gradients:
    sess.run(zero_grads)
    batch_loss = 0.0
    batch_acc = 0.0
    for batch in range(minibatch_count):
        
        # Fetch the training data:
        batch = mnist.train.next_batch(minibatch_size)
        ops = [accum_gradients, cross_entropy, accuracy]
        _, loss, acc = sess.run(ops, feed_dict={x: batch[0], y_ : batch[1]})
        batch_loss += loss
        batch_acc  += acc
        
    # Apply the gradients:
    sess.run(apply_gradients, feed_dict={})
    train_iters.append(_it)
    train_losses.append(batch_loss / minibatch_count)
    train_accs.append(batch_acc / minibatch_count)
    
    if _it % test_iteration == 0:
        # Test on the test set:
        batch = mnist.test.next_batch(10*minibatch_size)
        test_loss, test_acc = sess.run([cross_entropy, accuracy], feed_dict={x:batch[0], y_:batch[1]})
        test_iters.append(_it)
        test_losses.append(test_loss)
        test_accs.append(test_acc)
        # Report the results:
        print("Iteration {it}".format(it=_it))
        print("Loss:\t {test:.2}(test)\t{train:.2}(train)".format(test=test_loss, 
                                                                   train=batch_loss/minibatch_count))
        print("Acc: \t {test:.2}(test)\t{train:.2}(train)".format(test=test_acc,  
                                                                   train=batch_acc /minibatch_count))
        
    if _it != 0 and _it % snapshot_iteration == 0:
        writer.save(sess, save_path=save_dir + 'checkpoints/save', global_step=_it)
Iteration 0
Loss:	 2.8(test)	3.3(train)
Acc: 	 0.076(test)	0.053(train)
Iteration 5
Loss:	 2.0(test)	2.3(train)
Acc: 	 0.41(test)	0.24(train)
Iteration 10
Loss:	 1.7(test)	1.9(train)
Acc: 	 0.52(test)	0.45(train)
Iteration 15
Loss:	 1.5(test)	1.7(train)
Acc: 	 0.62(test)	0.53(train)
Iteration 20
Loss:	 1.3(test)	1.3(train)
Acc: 	 0.73(test)	0.75(train)
Iteration 25
Loss:	 1.2(test)	1.2(train)
Acc: 	 0.74(test)	0.64(train)
Iteration 30
Loss:	 1.2(test)	1.2(train)
Acc: 	 0.7(test)	0.75(train)
Iteration 35
Loss:	 0.97(test)	0.99(train)
Acc: 	 0.79(test)	0.8(train)
Iteration 40
Loss:	 0.89(test)	0.87(train)
Acc: 	 0.85(test)	0.87(train)
Iteration 45
Loss:	 0.88(test)	0.87(train)
Acc: 	 0.84(test)	0.8(train)
Iteration 50
Loss:	 0.79(test)	0.68(train)
Acc: 	 0.86(test)	0.89(train)
WARNING:tensorflow:Ignoring: /home/cadams/DeepLearnPhysics/mnist-train-and-analyze/log_mnist_classifier/train/checkpoints; No such file or directory
Iteration 55
Loss:	 0.73(test)	0.74(train)
Acc: 	 0.89(test)	0.84(train)
Iteration 60
Loss:	 0.63(test)	0.72(train)
Acc: 	 0.9(test)	0.84(train)
Iteration 65
Loss:	 0.63(test)	0.69(train)
Acc: 	 0.87(test)	0.88(train)
Iteration 70
Loss:	 0.52(test)	0.62(train)
Acc: 	 0.93(test)	0.85(train)
Iteration 75
Loss:	 0.51(test)	0.46(train)
Acc: 	 0.9(test)	0.96(train)
Iteration 80
Loss:	 0.47(test)	0.55(train)
Acc: 	 0.91(test)	0.88(train)
Iteration 85
Loss:	 0.44(test)	0.5(train)
Acc: 	 0.93(test)	0.95(train)
Iteration 90
Loss:	 0.44(test)	0.55(train)
Acc: 	 0.9(test)	0.88(train)
Iteration 95
Loss:	 0.38(test)	0.36(train)
Acc: 	 0.92(test)	0.96(train)
Iteration 100
Loss:	 0.38(test)	0.4(train)
Acc: 	 0.92(test)	0.92(train)
WARNING:tensorflow:Ignoring: /home/cadams/DeepLearnPhysics/mnist-train-and-analyze/log_mnist_classifier/train/checkpoints; No such file or directory
Iteration 105
Loss:	 0.35(test)	0.39(train)
Acc: 	 0.92(test)	0.95(train)
Iteration 110
Loss:	 0.33(test)	0.47(train)
Acc: 	 0.94(test)	0.87(train)
Iteration 115
Loss:	 0.34(test)	0.33(train)
Acc: 	 0.94(test)	0.93(train)
Iteration 120
Loss:	 0.32(test)	0.29(train)
Acc: 	 0.93(test)	0.92(train)
Iteration 125
Loss:	 0.32(test)	0.37(train)
Acc: 	 0.95(test)	0.93(train)
Iteration 130
Loss:	 0.33(test)	0.42(train)
Acc: 	 0.94(test)	0.91(train)
Iteration 135
Loss:	 0.31(test)	0.33(train)
Acc: 	 0.92(test)	0.93(train)
Iteration 140
Loss:	 0.26(test)	0.34(train)
Acc: 	 0.95(test)	0.95(train)
Iteration 145
Loss:	 0.33(test)	0.29(train)
Acc: 	 0.92(test)	0.99(train)
Iteration 150
Loss:	 0.32(test)	0.25(train)
Acc: 	 0.93(test)	0.96(train)
WARNING:tensorflow:Ignoring: /home/cadams/DeepLearnPhysics/mnist-train-and-analyze/log_mnist_classifier/train/checkpoints; No such file or directory
Iteration 155
Loss:	 0.25(test)	0.27(train)
Acc: 	 0.96(test)	0.96(train)
Iteration 160
Loss:	 0.24(test)	0.32(train)
Acc: 	 0.95(test)	0.91(train)
Iteration 165
Loss:	 0.27(test)	0.29(train)
Acc: 	 0.94(test)	0.95(train)
Iteration 170
Loss:	 0.17(test)	0.3(train)
Acc: 	 0.97(test)	0.92(train)
Iteration 175
Loss:	 0.19(test)	0.26(train)
Acc: 	 0.98(test)	0.95(train)
Iteration 180
Loss:	 0.22(test)	0.2(train)
Acc: 	 0.97(test)	0.99(train)
Iteration 185
Loss:	 0.18(test)	0.23(train)
Acc: 	 0.97(test)	0.99(train)
Iteration 190
Loss:	 0.19(test)	0.24(train)
Acc: 	 0.95(test)	0.95(train)
Iteration 195
Loss:	 0.23(test)	0.16(train)
Acc: 	 0.93(test)	0.99(train)
Iteration 200
Loss:	 0.2(test)	0.31(train)
Acc: 	 0.96(test)	0.89(train)
WARNING:tensorflow:Ignoring: /home/cadams/DeepLearnPhysics/mnist-train-and-analyze/log_mnist_classifier/train/checkpoints; No such file or directory
Iteration 205
Loss:	 0.17(test)	0.23(train)
Acc: 	 0.95(test)	0.91(train)
Iteration 210
Loss:	 0.24(test)	0.28(train)
Acc: 	 0.93(test)	0.91(train)
Iteration 215
Loss:	 0.17(test)	0.2(train)
Acc: 	 0.97(test)	0.93(train)
Iteration 220
Loss:	 0.17(test)	0.19(train)
Acc: 	 0.96(test)	0.96(train)
Iteration 225
Loss:	 0.16(test)	0.28(train)
Acc: 	 0.96(test)	0.93(train)
Iteration 230
Loss:	 0.14(test)	0.16(train)
Acc: 	 0.98(test)	0.96(train)
Iteration 235
Loss:	 0.2(test)	0.22(train)
Acc: 	 0.96(test)	0.96(train)
Iteration 240
Loss:	 0.19(test)	0.26(train)
Acc: 	 0.95(test)	0.92(train)
Iteration 245
Loss:	 0.17(test)	0.18(train)
Acc: 	 0.96(test)	0.97(train)
Iteration 250
Loss:	 0.18(test)	0.25(train)
Acc: 	 0.96(test)	0.93(train)
WARNING:tensorflow:Ignoring: /home/cadams/DeepLearnPhysics/mnist-train-and-analyze/log_mnist_classifier/train/checkpoints; No such file or directory
Iteration 255
Loss:	 0.14(test)	0.13(train)
Acc: 	 0.97(test)	0.99(train)
Iteration 260
Loss:	 0.16(test)	0.15(train)
Acc: 	 0.96(test)	0.97(train)
Iteration 265
Loss:	 0.2(test)	0.3(train)
Acc: 	 0.95(test)	0.92(train)
Iteration 270
Loss:	 0.15(test)	0.18(train)
Acc: 	 0.96(test)	0.93(train)
Iteration 275
Loss:	 0.18(test)	0.13(train)
Acc: 	 0.96(test)	0.99(train)
Iteration 280
Loss:	 0.13(test)	0.11(train)
Acc: 	 0.96(test)	0.97(train)
Iteration 285
Loss:	 0.16(test)	0.2(train)
Acc: 	 0.96(test)	0.93(train)
Iteration 290
Loss:	 0.17(test)	0.15(train)
Acc: 	 0.95(test)	0.97(train)
Iteration 295
Loss:	 0.14(test)	0.11(train)
Acc: 	 0.97(test)	1.0(train)
Iteration 300
Loss:	 0.14(test)	0.15(train)
Acc: 	 0.96(test)	0.96(train)
Iteration 305
Loss:	 0.11(test)	0.29(train)
Acc: 	 0.98(test)	0.91(train)
Iteration 310
Loss:	 0.11(test)	0.25(train)
Acc: 	 0.98(test)	0.95(train)
Iteration 315
Loss:	 0.12(test)	0.11(train)
Acc: 	 0.98(test)	0.97(train)
Iteration 320
Loss:	 0.15(test)	0.18(train)
Acc: 	 0.97(test)	0.99(train)
Iteration 325
Loss:	 0.11(test)	0.22(train)
Acc: 	 0.99(test)	0.95(train)
Iteration 330
Loss:	 0.092(test)	0.2(train)
Acc: 	 0.99(test)	0.95(train)
Iteration 335
Loss:	 0.13(test)	0.18(train)
Acc: 	 0.98(test)	0.96(train)
Iteration 340
Loss:	 0.11(test)	0.14(train)
Acc: 	 0.97(test)	0.96(train)
Iteration 345
Loss:	 0.13(test)	0.14(train)
Acc: 	 0.96(test)	0.97(train)
Iteration 350
Loss:	 0.12(test)	0.17(train)
Acc: 	 0.97(test)	0.95(train)
Iteration 355
Loss:	 0.14(test)	0.18(train)
Acc: 	 0.96(test)	0.95(train)
Iteration 360
Loss:	 0.089(test)	0.11(train)
Acc: 	 0.98(test)	0.97(train)
Iteration 365
Loss:	 0.11(test)	0.15(train)
Acc: 	 0.97(test)	0.92(train)
Iteration 370
Loss:	 0.13(test)	0.14(train)
Acc: 	 0.97(test)	0.96(train)
Iteration 375
Loss:	 0.12(test)	0.11(train)
Acc: 	 0.96(test)	0.97(train)
Iteration 380
Loss:	 0.085(test)	0.11(train)
Acc: 	 0.98(test)	0.97(train)
Iteration 385
Loss:	 0.12(test)	0.12(train)
Acc: 	 0.96(test)	0.96(train)
Iteration 390
Loss:	 0.13(test)	0.056(train)
Acc: 	 0.97(test)	0.99(train)
Iteration 395
Loss:	 0.14(test)	0.11(train)
Acc: 	 0.95(test)	0.99(train)
Iteration 400
Loss:	 0.13(test)	0.14(train)
Acc: 	 0.96(test)	0.97(train)
Iteration 405
Loss:	 0.11(test)	0.067(train)
Acc: 	 0.98(test)	1.0(train)
Iteration 410
Loss:	 0.12(test)	0.13(train)
Acc: 	 0.97(test)	0.97(train)
Iteration 415
Loss:	 0.092(test)	0.085(train)
Acc: 	 0.97(test)	0.99(train)
Iteration 420
Loss:	 0.14(test)	0.16(train)
Acc: 	 0.96(test)	0.93(train)
Iteration 425
Loss:	 0.14(test)	0.12(train)
Acc: 	 0.96(test)	0.99(train)
Iteration 430
Loss:	 0.093(test)	0.083(train)
Acc: 	 0.98(test)	1.0(train)
Iteration 435
Loss:	 0.096(test)	0.14(train)
Acc: 	 0.98(test)	0.96(train)
Iteration 440
Loss:	 0.12(test)	0.16(train)
Acc: 	 0.97(test)	0.96(train)
Iteration 445
Loss:	 0.13(test)	0.12(train)
Acc: 	 0.96(test)	0.97(train)
Iteration 450
Loss:	 0.093(test)	0.081(train)
Acc: 	 0.99(test)	0.99(train)
Iteration 455
Loss:	 0.092(test)	0.11(train)
Acc: 	 0.97(test)	0.97(train)
Iteration 460
Loss:	 0.072(test)	0.071(train)
Acc: 	 0.98(test)	0.99(train)
Iteration 465
Loss:	 0.13(test)	0.091(train)
Acc: 	 0.96(test)	1.0(train)
Iteration 470
Loss:	 0.11(test)	0.21(train)
Acc: 	 0.98(test)	0.93(train)
Iteration 475
Loss:	 0.1(test)	0.094(train)
Acc: 	 0.97(test)	0.99(train)
Iteration 480
Loss:	 0.084(test)	0.079(train)
Acc: 	 0.98(test)	1.0(train)
Iteration 485
Loss:	 0.12(test)	0.14(train)
Acc: 	 0.96(test)	0.97(train)
Iteration 490
Loss:	 0.053(test)	0.11(train)
Acc: 	 1.0(test)	0.99(train)
Iteration 495
Loss:	 0.11(test)	0.14(train)
Acc: 	 0.97(test)	0.96(train)
Iteration 500
Loss:	 0.08(test)	0.1(train)
Acc: 	 0.99(test)	0.96(train)
Iteration 505
Loss:	 0.1(test)	0.12(train)
Acc: 	 0.96(test)	0.97(train)
Iteration 510
Loss:	 0.058(test)	0.084(train)
Acc: 	 0.99(test)	0.97(train)
Iteration 515
Loss:	 0.088(test)	0.2(train)
Acc: 	 0.97(test)	0.96(train)
Iteration 520
Loss:	 0.094(test)	0.052(train)
Acc: 	 0.97(test)	1.0(train)
Iteration 525
Loss:	 0.086(test)	0.054(train)
Acc: 	 0.98(test)	0.99(train)
Iteration 530
Loss:	 0.11(test)	0.1(train)
Acc: 	 0.97(test)	0.97(train)
Iteration 535
Loss:	 0.1(test)	0.085(train)
Acc: 	 0.98(test)	0.99(train)
Iteration 540
Loss:	 0.11(test)	0.097(train)
Acc: 	 0.98(test)	0.99(train)
Iteration 545
Loss:	 0.1(test)	0.2(train)
Acc: 	 0.97(test)	0.96(train)
Iteration 550
Loss:	 0.11(test)	0.076(train)
Acc: 	 0.97(test)	1.0(train)
Iteration 555
Loss:	 0.073(test)	0.16(train)
Acc: 	 0.99(test)	0.96(train)
Iteration 560
Loss:	 0.072(test)	0.1(train)
Acc: 	 0.98(test)	0.99(train)
Iteration 565
Loss:	 0.11(test)	0.071(train)
Acc: 	 0.96(test)	0.99(train)
Iteration 570
Loss:	 0.083(test)	0.13(train)
Acc: 	 0.98(test)	0.96(train)
Iteration 575
Loss:	 0.082(test)	0.084(train)
Acc: 	 0.98(test)	0.99(train)
Iteration 580
Loss:	 0.11(test)	0.26(train)
Acc: 	 0.95(test)	0.95(train)
Iteration 585
Loss:	 0.056(test)	0.21(train)
Acc: 	 0.99(test)	0.96(train)
Iteration 590
Loss:	 0.12(test)	0.086(train)
Acc: 	 0.98(test)	0.99(train)
Iteration 595
Loss:	 0.088(test)	0.21(train)
Acc: 	 0.98(test)	0.95(train)
Iteration 600
Loss:	 0.072(test)	0.14(train)
Acc: 	 0.98(test)	0.99(train)
Iteration 605
Loss:	 0.094(test)	0.14(train)
Acc: 	 0.98(test)	0.97(train)
Iteration 610
Loss:	 0.058(test)	0.063(train)
Acc: 	 0.98(test)	0.99(train)
Iteration 615
Loss:	 0.11(test)	0.066(train)
Acc: 	 0.96(test)	0.97(train)
Iteration 620
Loss:	 0.055(test)	0.12(train)
Acc: 	 0.99(test)	0.99(train)
Iteration 625
Loss:	 0.12(test)	0.12(train)
Acc: 	 0.96(test)	0.96(train)
Iteration 630
Loss:	 0.071(test)	0.043(train)
Acc: 	 0.98(test)	1.0(train)
Iteration 635
Loss:	 0.097(test)	0.11(train)
Acc: 	 0.97(test)	0.96(train)
Iteration 640
Loss:	 0.082(test)	0.097(train)
Acc: 	 0.99(test)	0.97(train)
Iteration 645
Loss:	 0.093(test)	0.044(train)
Acc: 	 0.96(test)	0.99(train)
Iteration 650
Loss:	 0.091(test)	0.13(train)
Acc: 	 0.98(test)	0.95(train)
Iteration 655
Loss:	 0.069(test)	0.079(train)
Acc: 	 0.99(test)	0.97(train)
Iteration 660
Loss:	 0.075(test)	0.061(train)
Acc: 	 0.98(test)	1.0(train)
Iteration 665
Loss:	 0.046(test)	0.058(train)
Acc: 	 0.99(test)	0.99(train)
Iteration 670
Loss:	 0.098(test)	0.055(train)
Acc: 	 0.97(test)	1.0(train)
Iteration 675
Loss:	 0.08(test)	0.18(train)
Acc: 	 0.98(test)	0.95(train)
Iteration 680
Loss:	 0.073(test)	0.066(train)
Acc: 	 0.97(test)	0.99(train)
Iteration 685
Loss:	 0.084(test)	0.028(train)
Acc: 	 0.97(test)	1.0(train)
Iteration 690
Loss:	 0.089(test)	0.073(train)
Acc: 	 0.98(test)	0.99(train)
Iteration 695
Loss:	 0.065(test)	0.11(train)
Acc: 	 0.98(test)	0.97(train)
Iteration 700
Loss:	 0.12(test)	0.03(train)
Acc: 	 0.96(test)	1.0(train)
Iteration 705
Loss:	 0.036(test)	0.043(train)
Acc: 	 1.0(test)	0.99(train)
Iteration 710
Loss:	 0.082(test)	0.028(train)
Acc: 	 0.97(test)	1.0(train)
Iteration 715
Loss:	 0.088(test)	0.075(train)
Acc: 	 0.96(test)	0.99(train)
Iteration 720
Loss:	 0.096(test)	0.077(train)
Acc: 	 0.97(test)	0.97(train)
Iteration 725
Loss:	 0.08(test)	0.15(train)
Acc: 	 0.98(test)	0.95(train)
Iteration 730
Loss:	 0.045(test)	0.05(train)
Acc: 	 0.99(test)	0.99(train)
Iteration 735
Loss:	 0.07(test)	0.041(train)
Acc: 	 0.99(test)	1.0(train)
Iteration 740
Loss:	 0.092(test)	0.056(train)
Acc: 	 0.98(test)	1.0(train)
Iteration 745
Loss:	 0.06(test)	0.084(train)
Acc: 	 0.98(test)	0.99(train)
Iteration 750
Loss:	 0.076(test)	0.09(train)
Acc: 	 0.98(test)	0.97(train)
Iteration 755
Loss:	 0.095(test)	0.19(train)
Acc: 	 0.97(test)	0.97(train)
Iteration 760
Loss:	 0.063(test)	0.05(train)
Acc: 	 0.98(test)	1.0(train)
Iteration 765
Loss:	 0.082(test)	0.12(train)
Acc: 	 0.98(test)	0.96(train)
Iteration 770
Loss:	 0.085(test)	0.098(train)
Acc: 	 0.98(test)	0.97(train)
Iteration 775
Loss:	 0.065(test)	0.069(train)
Acc: 	 0.98(test)	0.97(train)
Iteration 780
Loss:	 0.073(test)	0.097(train)
Acc: 	 0.98(test)	0.97(train)
Iteration 785
Loss:	 0.072(test)	0.089(train)
Acc: 	 0.98(test)	0.97(train)
Iteration 790
Loss:	 0.069(test)	0.038(train)
Acc: 	 0.98(test)	1.0(train)
Iteration 795
Loss:	 0.061(test)	0.032(train)
Acc: 	 0.98(test)	1.0(train)
Iteration 800
Loss:	 0.078(test)	0.16(train)
Acc: 	 0.98(test)	0.92(train)
Iteration 805
Loss:	 0.068(test)	0.15(train)
Acc: 	 0.99(test)	0.96(train)
Iteration 810
Loss:	 0.077(test)	0.057(train)
Acc: 	 0.98(test)	0.99(train)
Iteration 815
Loss:	 0.043(test)	0.075(train)
Acc: 	 1.0(test)	0.99(train)
Iteration 820
Loss:	 0.065(test)	0.038(train)
Acc: 	 0.99(test)	1.0(train)
Iteration 825
Loss:	 0.049(test)	0.021(train)
Acc: 	 0.99(test)	1.0(train)
Iteration 830
Loss:	 0.056(test)	0.085(train)
Acc: 	 0.99(test)	0.99(train)
Iteration 835
Loss:	 0.085(test)	0.077(train)
Acc: 	 0.98(test)	0.96(train)
Iteration 840
Loss:	 0.082(test)	0.061(train)
Acc: 	 0.97(test)	0.99(train)
Iteration 845
Loss:	 0.03(test)	0.072(train)
Acc: 	 1.0(test)	0.99(train)
Iteration 850
Loss:	 0.067(test)	0.12(train)
Acc: 	 0.98(test)	0.96(train)
Iteration 855
Loss:	 0.069(test)	0.05(train)
Acc: 	 0.98(test)	0.99(train)
Iteration 860
Loss:	 0.053(test)	0.13(train)
Acc: 	 0.98(test)	0.97(train)
Iteration 865
Loss:	 0.058(test)	0.11(train)
Acc: 	 0.98(test)	0.95(train)
Iteration 870
Loss:	 0.086(test)	0.018(train)
Acc: 	 0.98(test)	1.0(train)
Iteration 875
Loss:	 0.093(test)	0.045(train)
Acc: 	 0.97(test)	1.0(train)
Iteration 880
Loss:	 0.058(test)	0.14(train)
Acc: 	 0.98(test)	0.95(train)
Iteration 885
Loss:	 0.1(test)	0.058(train)
Acc: 	 0.96(test)	0.99(train)
Iteration 890
Loss:	 0.038(test)	0.12(train)
Acc: 	 0.99(test)	0.97(train)
Iteration 895
Loss:	 0.051(test)	0.069(train)
Acc: 	 1.0(test)	0.97(train)
Iteration 900
Loss:	 0.054(test)	0.095(train)
Acc: 	 0.99(test)	0.97(train)
Iteration 905
Loss:	 0.071(test)	0.029(train)
Acc: 	 0.98(test)	1.0(train)
Iteration 910
Loss:	 0.068(test)	0.056(train)
Acc: 	 0.98(test)	1.0(train)
Iteration 915
Loss:	 0.063(test)	0.058(train)
Acc: 	 0.99(test)	0.99(train)
Iteration 920
Loss:	 0.054(test)	0.13(train)
Acc: 	 0.98(test)	0.97(train)
Iteration 925
Loss:	 0.062(test)	0.14(train)
Acc: 	 0.99(test)	0.96(train)
Iteration 930
Loss:	 0.099(test)	0.11(train)
Acc: 	 0.96(test)	0.96(train)
Iteration 935
Loss:	 0.042(test)	0.14(train)
Acc: 	 0.99(test)	0.96(train)
Iteration 940
Loss:	 0.078(test)	0.11(train)
Acc: 	 0.97(test)	0.99(train)
Iteration 945
Loss:	 0.05(test)	0.039(train)
Acc: 	 0.98(test)	0.99(train)
Iteration 950
Loss:	 0.051(test)	0.041(train)
Acc: 	 0.99(test)	0.99(train)
Iteration 955
Loss:	 0.053(test)	0.028(train)
Acc: 	 0.98(test)	1.0(train)
Iteration 960
Loss:	 0.06(test)	0.14(train)
Acc: 	 0.99(test)	0.96(train)
Iteration 965
Loss:	 0.033(test)	0.1(train)
Acc: 	 1.0(test)	0.95(train)
Iteration 970
Loss:	 0.071(test)	0.026(train)
Acc: 	 0.98(test)	1.0(train)
Iteration 975
Loss:	 0.095(test)	0.09(train)
Acc: 	 0.98(test)	0.97(train)
Iteration 980
Loss:	 0.043(test)	0.074(train)
Acc: 	 1.0(test)	0.97(train)
Iteration 985
Loss:	 0.054(test)	0.12(train)
Acc: 	 0.99(test)	0.96(train)
Iteration 990
Loss:	 0.097(test)	0.039(train)
Acc: 	 0.98(test)	0.97(train)
Iteration 995
Loss:	 0.074(test)	0.12(train)
Acc: 	 0.98(test)	0.97(train)
Iteration 1000
Loss:	 0.058(test)	0.16(train)
Acc: 	 0.98(test)	0.93(train)

Seems like things did pretty well. Let's plot the test and training accuracy to verify:

In [91]:
from matplotlib import pyplot as plt
%matplotlib inline
In [96]:
figure = plt.figure(figsize=(16,9))
plt.plot(train_iters, train_accs, label="Training Accuracy")
plt.plot(test_iters, test_accs, label="Testing Accuracy")
plt.grid(True)
plt.legend(fontsize=25)
plt.ylim(0,1.25)
plt.show()
In [97]:
figure = plt.figure(figsize=(16,9))
plt.plot(train_iters, train_losses, label="Training Loss")
plt.plot(test_iters, test_losses, label="Testing Loss")
plt.grid(True)
plt.legend(fontsize=25)
plt.ylim(0,2.5)
plt.show()

Based on this, things look good. It's not state-of-the-art accuracy but it's pretty good, so let's move on to the second part of the tutorial, restoring the model from file and doing analysis.