Due by class time Tuesday, February 28
In this assignment, you will implement a genetic algorithm to evolve control strategies for Robby the Robot, as described in Chapter 9 of Complexity: A Guided Tour by Melanie Mitchell. A control strategy will be represented as a string of characters that code for the following robot actions:
Your GA will maintain a population of genomes representing control strategies. Each genome will be a 243-character string specifying the action that Robby should take for every possible situation he might find himself in. Robby's "situation" will be encoded as a 5-character percept string specifying what Robby currently sees in his immediate vicinity. For example, the string 'WECWC' means there is a wall to the north, an empty grid cell to the south, a soda can to the east, another wall to the west, and a soda can in Robby's own grid cell. Each percept string corresponds to a unique code number in the range 0-242, which indicates the (0-based) index number of the situation, as illustrated in Table 9-1 on page 132 of the reading. The job of your GA is to evolve a good control strategy that maximizes Robby's average cumulative reward as he collects empty soda cans for 200 time steps.
To use the simulator, download the file robby-2.1.1.zip and unzip it. This will create a folder named robby. Put this folder in the same location as your Python program file for this assignment. IMPORTANT: Do not put your Python file or any other files inside the robby folder, or modify any of the code in the folder. Then put the following lines at the top of your program, which will create a 10 × 10 grid world named rw when your file is loaded into Python:
import robby rw = robby.World(10, 10)
To interact with the world, you can use the following commands:
rw.get_current_position() returns the current 0-based row, column position of Robby in the grid.
rw.get_percept() returns the percept string specifying what Robby currently sees to the North, South, East, West, and Here (in that order).
rw.get_percept_code() returns the code number of the current percept string as an integer in the range 0-242.
rw.distribute_cans(density=0.50) randomly distributes soda cans throughout the world, with 0 < density < 1 specifying the probability of a can occupying a grid cell. The default value is 0.50.
rw.goto(row, column) moves Robby to the specified grid location.
rw.perform_action(action) causes Robby to perform the specified action, where action is one of the strings "MoveNorth", "MoveSouth", "MoveEast", "MoveWest", "StayPut", "PickUpCan", or "MoveRandom". The value returned is the amount of reward or punishment received for the action.
The following abbreviations are provided for convenience:
rw.north() | = rw.perform_action("MoveNorth") | rw.stay() | = rw.perform_action("StayPut") | |
rw.south() | = rw.perform_action("MoveSouth") | rw.grab() | = rw.perform_action("PickUpCan") | |
rw.east() | = rw.perform_action("MoveEast") | rw.move() | = rw.perform_action("MoveRandom") | |
rw.west() | = rw.perform_action("MoveWest") | rw.look() | = rw.get_percept() |
rw.graphics_off(message="") turns off the graphics. This is useful for simulating many actions at high speed when evaluating the fitness of a strategy. The optional message string will be displayed while the graphics are turned off.
rw.graphics_on() turns the graphics back on, and updates the grid to reflect the current state of the world.
rw.show() prints out a non-graphical representation of the current state of the world.
rw.demo(strategy, steps=200, init=0.50) turns on the graphics and demos a strategy for the specified number of simulation steps (default 200) with a randomized soda can density of init (default 0.50). Optionally, init can be a string specifying the name of a grid world configuration file created with the save command. Note: this command cannot be used to calculate the fitness of a strategy; it simply shows the strategy in action.
rw.save(filename) saves the current grid world configuration as a text file, where filename is a string.
rw.load(filename) loads a grid world configuration from a text file, where filename is a string.
rw.strategyM is a string representing the hand-coded strategy M created by Melanie Mitchell, as described in the reading.
For example, you can watch strategy M in action by typing the command rw.demo(rw.strategyM) at the Python prompt.
I would recommend using the following parameters for your GA:
The fitness value of a control strategy should be the total reward accumulated during a cleaning session when following the strategy, averaged over 100 cleaning sessions. However, control strategies can have negative fitness values, which won't work with fitness-proportionate selection. So for this assignment, you will use rank selection instead. To implement rank selection, first calculate the fitness of each genome in the population. Then sort the genomes by their fitness values (whether positive or negative), in increasing order from lowest to highest fitness. The selection weight of a genome will be its rank number from 1 (lowest) to 200 (highest, equal to the population size), instead of the fitness value itself. Here is one way in Python to sort a list of genomes based on fitness. Assuming that a function called fitness is defined for genomes, the function sort_by_fitness shown below takes a list of genomes (strategy strings) and returns two new lists: (1) the genomes sorted into ascending order by fitness, and (2) their corresponding fitness values:
def sort_by_fitness(population): tuples = [(fitness(g), g) for g in population] tuples = sorted(tuples) sorted_fitness_values = [f for (f, g) in tuples] sorted_population = [g for (f, g) in tuples] return sorted_population, sorted_fitness_values
Since calculating the fitness of a single genome involves simulating many cleaning sessions, and each cleaning session involves many simulated robot actions, evaluating the fitness of the entire population of genomes requires a large amount of computation, which can take a substantial amount of time. Thus it is very important to make sure that your GA only calculates the fitness of each genome once per generation — for example, by calling sort_by_fitness and storing the results. The stored fitness values can then be used as needed, to select genomes for crossover, to compute the average fitness of the population, and so on, instead of having to be recomputed multiple times. Otherwise your GA could end up taking several hours longer than necessary to run!
Run your GA for a total of 500 generations. Every 10 generations, your GA should record the best strategy from the current population, along with the strategy's fitness value, in an output file called GAoutput.txt. More specifically, each line of the file should contain four items separated by whitespace, in the following order: (1) the generation number, (2) the average fitness of the whole population for this generation, (3) the fitness value of the best strategy of this generation, and (4) the best strategy itself. You should also have your GA periodically demo the best strategy found so far (every 10 or 20 generations, for example). That way, as the evolution proceeds, you can watch Robby's performance improve as he learns to clean up his environment.
You should also experiment with different GA settings (population size, number of generations, crossover and mutation rates, etc.) to see how quickly your GA can discover a really good strategy. Save the best strategy your GA managed to find in a text file called BestStrategy.txt, and write a short report summarizing your experiments, including the GA parameter settings that produced the best strategy.
Submit your Python program file, your GAoutput.txt data file, your BestStrategy.txt file, and a 3-4 page written report in PDF format of the experiments that you performed, using the Homework Upload Site. Your report should be as clear and complete as possible, and should include a brief introduction, a description of the experiments you performed, tables or graphs summarizing the results of your experiments, and a short discussion of your conclusions. Please also make sure to proofread your report carefully for correct spelling and grammar.
If you have questions about anything, don't hesitate to ask!