### Self-Check

Suppose a Conv2D layer with 64 filters and a 3 &times; 3 kernel receives RGB input images of shape 224 &times; 224 &times; 3.

1. How many **trainable parameters** will the layer have?

2. What will the layer's **output shape** be if the input image is **not padded** with extra 0's around the edges?

3. What will the layer's **output shape** be if the input image **is padded** with extra 0's around the edges?

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D

In [None]:
cnn = Sequential()
cnn.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu', input_shape=(224,224,3)))
cnn.summary()

In [None]:
cnn = Sequential()
cnn.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu', input_shape=(224,224,3),
               padding="same"))
cnn.summary()

## Working with a Pretrained VGG16 Convolutional Network

<img src="http://science.slc.edu/jmarshall/bioai/images/vgg16_architecture.jpg" width="75%">

The VGG16 network, developed by the Visual Geometry Group at Oxford, was trained on the large-scale ImageNet dataset, consisting of 1.4 million labeled images from 1,000 different categories.  Most of these images are of animals or other everyday objects, including many different breeds of cats and dogs.

[K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition (2014)](https://arxiv.org/abs/1409.1556) 

In [None]:
from tensorflow.keras.applications import VGG16

In [None]:
vgg16 = VGG16(weights='imagenet')

In [None]:
vgg16.summary()

### The Fruits Dataset

In [None]:
import random
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (2.5,2.5)

We will experiment with a dataset containing 100 images of **bananas**, 100 images of **oranges**, 100 images of **pineapples**, and 100 various **other** images.  First, we need to download it:

In [None]:
!curl -O science.slc.edu/jmarshall/bioai/data/fruits.npz

In [None]:
f = np.load('fruits.npz')

In [None]:
list(f.keys())

In [None]:
bananas, oranges, pineapples, other = f['bananas'], f['oranges'], f['pineapples'], f['other']

In [None]:
bananas.shape

In [None]:
def show_random_image(images):
    i = random.randrange(len(images))
    print(f"image {i}")
    plt.imshow(images[i])

In [None]:
show_random_image(bananas)

In [None]:
def show_random_selection(images):
    indices = range(len(images))
    plt.figure(figsize=(15,12))  # (width, height) in inches
    rows, columns = 5, 6
    for k in range(1, columns*rows+1):
        i = random.choice(indices)
        plt.subplot(rows, columns, k)
        #plt.title(names[train_labels[i]])
        plt.axis('off')
        plt.imshow(images[i])

In [None]:
show_random_selection(bananas)

In [None]:
show_random_selection(pineapples)

### Testing the Pretrained VGG16 Network

In [None]:
plt.imshow(bananas[0]);

In [None]:
bananas[0].shape

In [None]:
batch = bananas[0].reshape((1,224,224,3))

In [None]:
batch.shape

In [None]:
# alternative approach
batch = np.array([bananas[0]])  # note the extra []'s

In [None]:
batch.shape

In [None]:
output = vgg16.predict(batch)

In [None]:
output.shape

In [None]:
np.argmax(output[0])

In [None]:
output[0][587]

In [None]:
from tensorflow.keras.applications.imagenet_utils import decode_predictions

In [None]:
# takes a batch of predictions and returns (class_name, class_description, score)
decode_predictions(output)

In [None]:
decode_predictions(output, top=3)

In [None]:
# version 1

def classify(network, image):
    plt.imshow(image)
    plt.axis('off')
    input_batch = np.array([image])  # note the extra []'s
    output_batch = network.predict(input_batch)
    predictions_batch = decode_predictions(output_batch)
    prediction = predictions_batch[0]
    print(prediction)

In [None]:
classify(vgg16, bananas[0])

Let's improve the appearance of the output.

In [None]:
# version 2: improved prediction descriptions

def classify(network, image):
    plt.imshow(image)
    plt.axis('off')
    input_batch = np.array([image])  # note the extra []'s
    output_batch = network.predict(input_batch)
    predictions_batch = decode_predictions(output_batch)
    prediction = predictions_batch[0]
    # added this:
    for i, guess in enumerate(prediction):
        wordnet_id, name, confidence = guess
        print(f"{i+1}. {name} ({confidence*100:.1f}%)")

In [None]:
classify(vgg16, bananas[0])

In [None]:
classify(vgg16, random.choice(bananas))

The network's performance seems pretty bad.  Maybe we should convert the images to floats in the range [0., 1.]?

In [None]:
bananas.dtype, bananas.min(), bananas.max()

In [None]:
banana_floats = bananas.astype('float32') / 255

In [None]:
banana_floats.dtype, banana_floats.min(), banana_floats.max()

In [None]:
plt.imshow(banana_floats[0]);

In [None]:
classify(vgg16, random.choice(banana_floats))

The performance seems just as bad, or even worse!  What's going on?

According to the published paper, the VGG16 network was trained on images that were preprocessed by "subtracting the mean RGB value, computed on the training set, from each pixel" [(Simonyan and Zisserman, 2014)](https://arxiv.org/abs/1409.1556).  In Keras, there is a function available that does this transformation, called `preprocess_input`.

In [None]:
from tensorflow.keras.applications.imagenet_utils import preprocess_input

In [None]:
bananas.dtype, bananas.min(), bananas.max()

In [None]:
processed_bananas = preprocess_input(bananas)

In [None]:
processed_bananas.shape

In [None]:
processed_bananas.dtype, processed_bananas.min(), processed_bananas.max()

In [None]:
bananas[0]

In [None]:
processed_bananas[0]

### Pitfalls to Avoid

<font color="red">**WARNING**:</font> `preprocess_input` should only be called on images of **integer values in the range [0, 255]**, not floats in the range [0., 1.]

In [None]:
banana_floats.dtype, banana_floats.min(), banana_floats.max()

In [None]:
processed_banana_floats = preprocess_input(banana_floats)

In [None]:
processed_banana_floats.dtype, processed_banana_floats.min(), processed_banana_floats.max()

The resulting values are all negative, because the inputs should have been in the range [0, 255].  Furthermore:

In [None]:
banana_floats.dtype, banana_floats.min(), banana_floats.max()

`preprocess_input` **completely messed up** <tt>banana_floats</tt>!  Apparently this function side-effects images of float values, but leaves images of integer values intact.  How utterly ridiculous!

In [None]:
# redefine banana_floats
banana_floats = bananas.astype('float32') / 255

In [None]:
banana_floats.dtype, banana_floats.min(), banana_floats.max()

In [None]:
plt.imshow(banana_floats[0]);

In [None]:
bananas.dtype, bananas.min(), bananas.max()

In [None]:
plt.imshow(bananas[0]);

In [None]:
processed_bananas.dtype, processed_bananas.min(), processed_bananas.max()

In [None]:
plt.imshow(processed_bananas[0]);

We cannot display the processed images with <tt>plt.imshow</tt>, because they are neither floats in the range [0., 1.] nor integers in the range [0, 255].

### Classifying the Preprocessed Images

In [None]:
# version 3: uses preprocess_input

def classify(network, image):
    # verify we have the right type of image
    if not (image.dtype == 'uint8' and 0 <= image.min() <= 255 and 0 <= image.max() <= 255):
        print("Sorry, image must be integers in the range [0, 255]")
        return
    plt.imshow(image)
    plt.axis('off')
    input_batch = np.array([image])  # note the extra []'s
    output_batch = network.predict(preprocess_input(input_batch))  # changed
    predictions_batch = decode_predictions(output_batch)
    prediction = predictions_batch[0]
    for i, guess in enumerate(prediction):
        wordnet_id, name, confidence = guess
        print(f"{i+1}. {name} ({confidence*100:.1f}%)")

In [None]:
classify(vgg16, random.choice(processed_bananas))

In [None]:
classify(vgg16, random.choice(bananas))

### Using a Lambda Layer to Preprocess the Images Automatically

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Lambda

In [None]:
def build_network():
    network = Sequential()
    network.add(Lambda(preprocess_input, input_shape=(224,224,3)))
    network.add(VGG16(weights='imagenet'))
    return network

In [None]:
network = build_network()

In [None]:
network.summary()

In [None]:
network.layers

In [None]:
network.layers[1].summary()

In [None]:
# version 4: no longer needs to call preprocess_input explicitly

def classify(network, image):
    # verify we have the right type of image
    if not (image.dtype == 'uint8' and 0 <= image.min() <= 255 and 0 <= image.max() <= 255):
        print("Sorry, image must be integers in the range [0, 255]")
        return
    plt.imshow(image)
    plt.axis('off')
    input_batch = np.array([image])  # note the extra []'s
    output_batch = network.predict(input_batch)  # changed
    predictions_batch = decode_predictions(output_batch)
    prediction = predictions_batch[0]
    for i, guess in enumerate(prediction):
        wordnet_id, name, confidence = guess
        print(f"{i+1}. {name} ({confidence*100:.1f}%)")


In [None]:
classify(network, random.choice(bananas))

In [None]:
classify(network, random.choice(oranges))

In [None]:
classify(network, random.choice(pineapples))

Surprisingly, the network is no good at identifying watermelons, tomatoes, apples (except for Granny Smiths), or pumpkins (except for jack-o-lanterns).

In [None]:
classify(network, random.choice(other))

### Cats and Dogs

In [None]:
!curl -O science.slc.edu/jmarshall/bioai/data/cats_dogs_100_100_224x224.npz

In [None]:
f = np.load('cats_dogs_100_100_224x224.npz')

In [None]:
cats, dogs = f['cats'], f['dogs']

In [None]:
cats.shape

In [None]:
dogs.shape

In [None]:
show_random_selection(cats)

In [None]:
show_random_selection(dogs)

In [None]:
classify(network, random.choice(dogs))

### Visualizing Feature Maps

In [None]:
network.summary()

In [None]:
network.layers

In [None]:
vgg16 = network.get_layer('vgg16')

In [None]:
vgg16.summary()

In [None]:
vgg16.layers

In [None]:
[layer.name for layer in vgg16.layers]

In [None]:
vgg16.layers[0]

In [None]:
vgg16.layers[0].input

In [None]:
vgg16.get_layer('block1_conv1')

In [None]:
vgg16.get_layer('block1_conv1').output

In [None]:
from tensorflow.keras.models import Model

input_tensor = vgg16.layers[0].input
output_tensor = vgg16.get_layer('block1_conv1').output
model = Model(inputs=input_tensor, outputs=output_tensor)

In [None]:
batch = np.array([preprocess_input(bananas[0])])

In [None]:
batch.shape

In [None]:
batch.dtype, batch.min(), batch.max()

In [None]:
output = np.array(model(batch))

In [None]:
output.shape

In [None]:
feature_maps = output[0]

In [None]:
feature_maps.shape

In [None]:
feature_maps.dtype, feature_maps.min(), feature_maps.max()

In [None]:
plt.imshow(bananas[0]);

In [None]:
plt.imshow(feature_maps[:,:,5], cmap='gray');

In [None]:
# this function was called "show_channels" before

# general-purpose utility function to display VGG16 feature maps for an input image
# example: show_features(vgg16, image, 'block1_conv1')

from tensorflow.keras.models import Model

def show_features(vgg16, image, layer_name, features=range(20), cmap='gray', cols=5):
    # features can be a number like 0 or a sequence like [0, 2, 4] or range(10)
    layer_names = [layer.name for layer in vgg16.layers]
    if layer_name not in layer_names:
        print(f"No such layer: {layer_name}")
        return
    # generate feature maps for layer_name
    input_tensor = vgg16.layers[0].input
    output_tensor = vgg16.get_layer(layer_name).output
    model = Model(inputs=input_tensor, outputs=output_tensor)
    input_batch = np.array([preprocess_input(image)])
    output_batch = np.array(model(input_batch))
    output = output_batch[0]
    h, w, d = output.shape
    # display image
    plt.axis('off')
    plt.imshow(image)
    # display feature maps
    if type(features) is int:
        features  = [features]
    rows = len(features) // cols
    if len(features) > rows*cols:
        rows += 1
    fig = plt.figure(figsize=(13, 13/cols*rows))
    i = 1
    for feature in features:
        if 0 <= feature < d:
            fig.add_subplot(rows, cols, i)
            i += 1
            plt.imshow(output[:,:,feature], cmap=cmap)
            plt.title(f"feature {feature}")
            plt.axis('off')

In [None]:
show_features(vgg16, bananas[0], 'block1_conv1')

In [None]:
show_features(vgg16, bananas[0], 'block1_conv1', features=range(20,64))

In [None]:
show_features(vgg16, bananas[0], 'block1_pool')

In [None]:
vgg16.summary()

In [None]:
show_features(vgg16, bananas[2], 'block3_conv1')

In [None]:
show_features(vgg16, bananas[2], 'block3_pool')

In [None]:
show_features(vgg16, random.choice(cats), 'block1_pool')