You will not need to do significant amounts of coding for this assignment, but training your networks will take time. It is recommended that you read through the assignment in its entirety first, and start early.
You will be working with image files with the following naming convention:
<userid>_<pose>_<expression>_<eyes>_<scale>.pgm
To start xv, just type "xv filename" at the Unix prompt. This will bring up an X window displaying the file. Clicking the right mouse button in the image window will toggle a control panel with a variety of buttons. Selecting Double Size from the Image Size menu doubles the displayed size of the image every time you click on it. You can also expand or shrink an image by placing the mouse pointer inside the image window and pressing > or <. This will be useful for viewing the quarter-resolution images.
You can also obtain pixel values by holding down the middle mouse button while moving the pointer in the image window. A text bar will be displayed, showing you the image coordinates and brightness value where the pointer is located.
To quit, just click on the Quit button or type q in one of the xv windows.
The code is located in /cs/cs152/hw4/code on turing and
pccs. Copy either the Java or C version to your own
directory and type
Documentation is available for the Java version of the code, or for the original C version (in PostScript).
You should also copy all of the files found in the hw4/datasets subdirectory to your own directory. For convenience, you should place these files in the same directory as your code. These files contain lists of image files to be used for training and testing.
1. The code you have been given is currently set up to learn to recognize the person with user id glickman. Modify this code to implement a "sunglasses recognizer". That is, train a network which, when given an image as input, indicates whether the face in the image is wearing sunglasses or not. See the documentation for an overview of how to make changes to the code.
2. Train a network using the default learning parameter settings (learning rate 0.3, momentum 0.3) for 75 epochs, with the following command:
java facetrain -n shades.net -t straightrnd_train.list -1 straightrnd_test1.list -2 straightrnd_test2.list -e 75
facetrain's arguments are described in the documentation, but a short description is in order here. shades.net is the name of the network file that will be saved when training is finished. straightrnd_train.list, straightrnd_test1.list, and straightrnd_test2.list are text files that specify the training set (70 examples) and two test sets (34 and 52 examples), respectively.
This command creates and trains your net on a randomly chosen sample of 70 of the 156 "straight" images (namely, those whose names appear in the straightrnd_train.list file), and tests it on the remaining 34 and 52 randomly chosen images, respectively. One way to think of this test strategy is that roughly 1/3 of the images (straightrnd_test2.list) have been held over for testing. The remaining 2/3 have been used for a train-and-cross-validate strategy, in which 2/3 of these are being used as a training set (straightrnd_train.list) and 1/3 are being used as a validation set to decide when to halt training (straightrnd_test1.list).
4. What code did you modify? What was the maximum classification accuracy achieved on the training set? How many epochs did it take to reach this level? How about for the validation set? The test set? Note that if you run it again on the same system with the same parameters as input, you should get exactly the same results because the code uses the same random number seed each time (see the code for facetrain to change this).
5. Now implement a 1-of-20 face recognizer. That is, implement a neural net that accepts an image as input and outputs the userid of the person, using some appropriate representation scheme. To do this, you will need to implement a different output encoding, since you must now be able to distinguish among 20 people. (Hint: leave learning rate and momentum at 0.3, and use 20 hidden units.)
6. As before, train the network, this time for 100 epochs:
java facetrain -n face.net -t straighteven_train.list -1 straighteven_test1.list -2 straighteven_test2.list -e 100
You might be wondering why you are only training on samples from a limited distribution (the "straight" images). The essential reason is training time. If you have access to a very fast machine, then you are welcome to do these experiments on the entire set (replace straight with all in the above command). Otherwise, stick to the "straight" images.
The difference between the straightrnd_*.list and straighteven_*.list sets is that while the former divides the images purely randomly among the training and testing sets, the latter ensures a relatively even distribution of each individual's images over the sets. Because we have only 7 or 8 "straight" images per individual, failure to distribute them evenly would result in testing our network the most on those faces on which it was trained the least.
7. Which parts of the code was it necessary to modify this time? How did you encode the outputs? What was the maximum classification accuracy achieved on the training set? How many epochs did it take to reach this level? How about the validation and test sets?
8. Now let's take a closer look at which images the net may have failed to classify:
java facetrain -n face.net -T -1 straighteven_test1.list -2 straighteven_test2.list
Do there seem to be any particular commonalities between the misclassified images?
9. Implement a pose recognizer. That is, implement a neural net that, when given an image as input, indicates whether the person in the image is looking straight ahead, up, to the left, or to the right. You will also need to implement a different output encoding for this task. (Hint: leave learning rate and momentum at 0.3 and use 6 hidden units.)
10. Train the network for 100 epochs, this time on samples drawn from all of the images:
java facetrain -n pose.net -t all_train.list -1 all_test1.list -2 all_test2.list -e 100
Since the pose-recognizing network should have substantially fewer weights to update than the face-recognizing network, even those of you with slow machines can get in on the fun of using all of the images. In this case, 260 examples are in the training set, 140 examples are in test1, and 193 are in test2.
11. How did you encode your outputs this time? What was the maximum classification accuracy achieved on the training set? How many epochs did it take to reach this level? How about for the validation and test sets?
12. Now try taking a look at how backpropagation tuned the weights of
the hidden units with respect to each pixel. First type
java hidden2pgm pose.net image-filename h
Invoking xv on the image image-filename should then display the weights, with the lowest-valued weights mapped to pixel values of zero, and the highest mapped to 255. The bias of hidden unit h corresponds to the upper left pixel of the image. If the images just look like noise, try retraining using initial weights of zero rather than random values (this will require changing a couple of lines in BPNN.java or backprop.c).
You can also view the weights of the output units with the output2pgm utility. See the documentation for details.
13. Do the hidden units seem to weight particular regions of the image
greater than others? Do particular hidden units seem to be tuned
to different features of some sort?
Some possibilities are given below (but please don't let this limit your thinking):
Part 2
Turn in a single writeup of your (or your group's) experiments, describing
what you did and what you concluded. As for Part 1, include hardcopy
printouts of any code that you modified or added. If you wish, you
may submit a complete, working copy of your code on turing (one copy per
group, please), along with README instructions on how to compile and run
it, so that I can see for myself what it does.