## A Pre-Trained Convolutional Network

To illustrate the ideas of convolutional neural networks more concretely, we will experiment with a complete, pre-trained network available in Keras, called a VGG16 network.  The VGG16 network was developed by the Visual Geometry Group at Oxford, and was trained on the large-scale ImageNet dataset, consisting of 1.4 million labeled images from 1,000 different categories.  Most of the images are of animals or other everyday objects, including many different breeds of cats and dogs.  The diagram below shows the architecture of the network along with the sizes and types of each layer.  For more detailed information, see the paper [K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition (2014)](https://arxiv.org/abs/1409.1556)

<img src="http://science.slc.edu/jmarshall/bioai/images/cnn/vgg16_architecture.jpg" width="75%">

In [None]:
from tensorflow.keras.applications import VGG16

In [None]:
vgg16 = VGG16(weights='imagenet')

Compare the summary of the network layers below to the above picture.

In [None]:
vgg16.summary()

The network structure is organized into "blocks", each consisting of two or three convolutional layers followed by a 2 &times; 2 max-pooling layer that reduces the size of the resulting feature maps by 50%.  Each layer in the first block produces 64 different feature maps; each layer in the second block, 128 feature maps, and so on.  The schematic below shows the structure of each block, with convolutional layers in gray and max-pooling layers in red.  In the diagram, rectangle height reflects the relative *size* of the feature maps created at each layer, and rectangle width reflects the *number* of feature maps created.  As information flows through the network, the original input image is gradually transformed into a much smaller but more abstract representation of the image.  The output of the final max-pooling layer in block 5 is a set of 512 feature maps, each of size 7 &times; 7.  These feature maps then get "flattened" into a single vector of 4096 values, which is fed into the final densely-connected classification layers of the network.

<img src="http://science.slc.edu/jmarshall/bioai/images/cnn/vgg16_blocks.png" width="65%">

Let's feed a few input images to this pre-trained network and examine the resulting feature maps that get created on each intermediate layer.  First, we will download a set of images of cats and dogs, each of size 224 &times; 224 pixels.

In [None]:
!curl -O science.slc.edu/jmarshall/bioai/data/cats_dogs_100_100_224x224.npz

In [None]:
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (2.5,2.5)

In [None]:
f = np.load('cats_dogs_100_100_224x224.npz')

In [None]:
cats = f['cats']

In [None]:
cats.shape

In [None]:
dogs = f['dogs']

In [None]:
dogs.shape

Let's see a couple of example images:

In [None]:
plt.imshow(cats[0]);

In [None]:
plt.imshow(dogs[9]);

To make it easy to view several images and feature maps at once, we'll first define a couple of utility functions.  By default, `show_images` displays the first ten images of the specified dataset.  The optional `which` keyword can be used to specify a different range of image indices.  The `rows` and `cols` keywords specify the maximum number of rows and columns to use for displaying the images.

In [None]:
# general-purpose utility function to display dataset images.

def show_images(images, which=None, rows=2, cols=5):
    # which can be an index number like 0 or a sequence like [0, 2, 4] or range(10)
    if which is None:
        # defaults to all images
        which = range(len(images))
    elif type(which) is int:
        which = [which]
    elif type(which) not in (tuple, list, range):
        print("Please specify a range of image indices")
        return
    plt.figure(figsize=(3*cols,3*rows))  # (width, height) in inches
    k = 0
    for i in which:
        if 0 <= i < len(images):
            k += 1
            plt.subplot(rows, cols, k)
            plt.title(f'{i}')
            plt.axis('off')
            plt.imshow(images[i])
            if k == rows*cols:
                break
    if k == 0:
        print("No such image")

In [None]:
show_images(cats)

In [None]:
show_images(cats, which=[11, 22, 33, 44])

In [None]:
show_images(dogs)

In [None]:
# general-purpose utility function to display the feature maps
# for different channels of a VGG16 convolutional network layer.
#
# example: show_channels(vgg16, image, 'block1_conv1', channels=range(0,20))

from tensorflow.keras.applications.imagenet_utils import preprocess_input
from tensorflow.keras.models import Model

def show_channels(network, image, layer_name, channels=range(20), cmap='gray', cols=5):
    # channels can be a number like 0 or a sequence like [0, 2, 4] or range(10)
    layer_names = [layer.name for layer in network.layers]
    if layer_name not in layer_names:
        print(f'No such layer: {layer_name}')
        return
    # generate activation maps for layer_name
    input_tensor = network.layers[0].input
    output_tensor = network.get_layer(layer_name).output
    activation_model = Model(inputs=input_tensor, outputs=output_tensor)
    batch = np.array([preprocess_input(image)])
    #output = activation_model.predict(batch)[0] # using predict causes a weird warning message
    output = activation_model(batch)[0].numpy()
    h, w, d = output.shape
    # display activation maps
    if type(channels) is int:
        channels  = [channels]
    rows = len(channels) // cols
    if len(channels) > rows*cols:
        rows += 1
    plt.figure(figsize=(3*cols,3*rows))  # (width, height) in inches
    k = 0
    for channel in channels:
        if 0 <= channel < d:
            k += 1
            plt.subplot(rows, cols, k)
            plt.imshow(output[:,:,channel], cmap=cmap)
            plt.title(f'channel {channel}')
            plt.axis('off')

Let's start with cat image \#11. 

In [None]:
plt.imshow(cats[11]);

We will run this image through the VGG16 network and then look at some of the feature maps generated on the first convolutional layer (`block1_conv1`).  During training, each of the layer's 64 filters learned to respond to different aspects of the input data.  Here are the feature maps generated by the first five filters (channels):

In [None]:
show_channels(vgg16, cats[11], 'block1_conv1', channels=range(0,5), cmap='gray')

Each feature map brings out a different aspect of the original image, although some feature maps are similar.  Notice that here channel 0 seems to highlight edges that separate a lighter region to the edge's left from a darker region to the edge's right, while channel 4 highlights edges with the opposite relationship. For completeness, here are all 64 feature maps generated by the `block1_conv1` layer for cat \#11.  Some channels appear to highlight eyes (*e.g.*, channels 12, 27, and 36), while others seem to highlight edges between light and dark regions (*e.g.*, channels 6, 7, and 23):

In [None]:
show_channels(vgg16, cats[11], 'block1_conv1', channels=range(0,64), cols=8)

Next, let's look at the output of the first max-pooling layer, `block1_pool`.  The max-pooling operation reduces the resolution of each feature map from 224 &times; 224 pixels to 112 &times; 112 pixels, while simultaneously enhancing the most strongly activated areas of each map through the `max` operation.

In [None]:
show_channels(vgg16, cats[11], 'block1_pool', channels=range(0,10))

Each subsequent max-pooling operation further reduces the resolution of the feature maps, while progressively combining information from larger and larger regions of the original input image.  Here are the first ten feature maps generated at the `block4_pool` layer:

In [None]:
show_channels(vgg16, cats[11], 'block4_pool', channels=range(0,10))

### Tracing the Transformation of Information

Let's look at a specific image and watch how it gets transformed as it progresses through the network layer by layer. We will start with dog image \#9 and show the first five feature maps generated at each convolutional and max-pooling layer of the network.

In [None]:
input_image = dogs[9]
channels = range(0,5)

In [None]:
plt.imshow(input_image);

In [None]:
show_channels(vgg16, input_image, 'block1_conv1', channels)

In [None]:
show_channels(vgg16, input_image, 'block1_conv2', channels)

In [None]:
show_channels(vgg16, input_image, 'block1_pool', channels)

In [None]:
show_channels(vgg16, input_image, 'block2_conv1', channels)

In [None]:
show_channels(vgg16, input_image, 'block2_conv2', channels)

In [None]:
show_channels(vgg16, input_image, 'block2_pool', channels)

In [None]:
show_channels(vgg16, input_image, 'block3_conv1', channels)

In [None]:
show_channels(vgg16, input_image, 'block3_conv2', channels)

In [None]:
show_channels(vgg16, input_image, 'block3_conv3', channels)

In [None]:
show_channels(vgg16, input_image, 'block3_pool', channels)

In [None]:
show_channels(vgg16, input_image, 'block4_conv1', channels)

In [None]:
show_channels(vgg16, input_image, 'block4_conv2', channels)

In [None]:
show_channels(vgg16, input_image, 'block4_conv3', channels)

In [None]:
show_channels(vgg16, input_image, 'block4_pool', channels)

In [None]:
show_channels(vgg16, input_image, 'block5_conv1', channels)

In [None]:
show_channels(vgg16, input_image, 'block5_conv2', channels)

In [None]:
show_channels(vgg16, input_image, 'block5_conv3', channels)

In [None]:
show_channels(vgg16, input_image, 'block5_pool', channels)