Genetic Algorithms

Main Components of a Genetic Algorithm

An encoding of candidate solutions in some formal representation, such as bit strings ("chromosomes") .

A fitness function that maps candidate solutions to real numbers.

The fitness function defines a fitness landscape over the search space (similar to error landscape over weight space for neural networks)

f (00) --> 0.5
f (01) --> 0.9
f (10) --> 0.1
f (11) --> 0.0

A population of candidate solutions

A set of genetic operators for transforming candidate solutions into new candidate solutions.

Can be viewed as a type of reinforcement learning with feedback provided by fitness function.

Genetic Operators

selection - select chromosomes for inclusion in the next generation as a probabilistic function of fitness

crossover - combine two chromosomes to produce new chromosomes for inclusion in the next generation (akin to sexual recombination in nature)
mutation - modify components of a chromosome (alleles) with some probability

Simple Genetic Algorithm

1. Generate random population of N L-bit chromosomes

2. Calculate fitness f (x) of each chromosome in population

3. Repeat until N offspring have been created:

Select a pair of chromosomes from the current population as a probabilistic function of fitness

Perform crossover on chromosomes with probability p_c

Mutate each bit of offspring chromosomes with probability p_m

Add offspring to the new population

4. Replace current population with new population

5. Go to step 2

Example

8-bit chromosomes

Fitness function f (x) = number of 1 bits in chromosome

Population size N = 4

Crossover probability p_c = 0.7

Mutation probability p_m = 0.001

Chromosome	Fitness
A: 00000110	2
B: 11101110	6
C: 00100000	1
D: 00110100	3

Average fitness of population = 12/4 = 3.0

1. B and C selected, crossover not performed

2. B mutated

B: 11101110 ----> B': 01101110

3. B and D selected, crossover performed

    B: 11101110                     E: 10110100
                               ---->
    D: 00110100                     F: 01101110

4. E mutated

E: 10110100 ----> E': 10110000

New population:

Chromosome Fitness

B': 01101110 5

C: 00100000 1

E': 10110000 3

F: 01101110 5

Best-fit string from previous population lost, but...

Average fitness of population now 14/4 = 3.5

Pyro pages on evolutionary algorithms

Example GA code from class

Schemas

A schema is a template that represents a set of bit strings

**1**0* --> { 1110000, 0010001, 0111001, 0010000, ... }

Every schema s has an estimated average fitness f(s), determined by the fitness function and the current instances of s in the population

Schema Theorem:

Expected[ N_t+1 ] > [ f(s) / f(pop) ] · N_t · k_c · k_m (John Holland, 1975)

where

N_t is the number of instances of schema s in the population at time t
f(s) is the estimated average fitness of schema s
f(pop) is the average fitness of all strings in the population
0 < k_c < 1 is a crossover term that depends on schema s
0 < k_m < 1 is a mutation term that depends on schema s

Schema s receives exponentially increasing or decreasing numbers of instances in the population, depending on ratio f(s) / f(pop)

Above average schemas will tend to spread through population, below-average schemas will tend to disappear

This happens simultaneously for all schemas present in the population ("implicit parallelism")

Evolving Neural Network Architectures

Many ways to encode a neural network as a chromosome
Specificity of genotype->phenotype mapping can vary over a wide range:
- Chromosome encodes each weight value directly
- Chromosome encodes constraints for each weight value (e.g., learnable, fixed, positive, negative)
- Chromosome encodes connectivity between groups of units (e.g., layers, modules)
- Chromosome encodes a genetic "blueprint" that must be translated into a network via a developmental process (which itself may be encoded on the chromosome)
Example: Geoffrey Miller, Peter Todd, and Shailesh Hegde evolved network architectures for
- XOR
- 4-Quadrant
- Encoder/Decoder problem
Chromosome specified constraints (learnable vs. fixed) for each weight and bias of the network
Fitness determined by network's performance on the training task after a fixed number of epochs of backprop with randomly-initialized weights
GA parameters:
- Fitness-proportionate selection
- Bitwise mutation (rate = 0.005)
- Crossover restricted to row boundaries, to preserve functional units (rate = 0.6)
- Population size = 50
Results: GA tended to find asymmetric, non-uniform architectures that worked well

Evolving Weights in a Fixed Network

David Montana and Lawrence Davis evolved a network to classify sonar spectograms as "interesting" or "not interesting"
Used database of sonar spectograms classified by human experts
Architecture of network was fixed, only the weights evolved
Each chromosome a vector of 126 real numbers
Fitness function: performance of network on training set of 236 examples
Population size: 50 networks
Rank selection
Initial random weights between -1.0 and +1.0
Treat incoming weights to a node as a functional unit
Mutation: add/subtract random amount to incoming weights
Crossover: recombine functional units of parents
Compared GA to backpropagation
GA ran for 200 generations, backprop for 5000 epochs
GA significantly outperformed backprop
Faster versions of backprop (quickprop) may still beat GA
GA approach is best if supervised feedback is unavailable

Evolving a Neural Controller for a Simulated Walking Insect

Work by Randy Beer at Case Western Reserve
Reference: Beer, R.D. (1995), "A dynamical systems perspective on autonomous agents", Artificial Intelligence 72, 173-215.
simulated insect with six legs ("hexapod")
each leg controlled by three effectors ("muscles")

leg position (up or down)
forward torque
backward torque

at least three legs must be down simultaneously to maintain balance
center of gravity must lie within polygon spanned by supporting feet
each leg has a sensor that measures its angle relative to the body axis
genetic algorithm was used to evolve a neural network for controlling the insect

Network Architecture

each leg controlled by a fully-recurrent neural network with five units
continuous-valued units with sigmoid activation functions
each unit received input from the leg's angle sensor
three units controlled the leg's effectors

activation of foot unit determined whether foot was up or down ( > 0.5 ==> up)
activations of forward/backward swing units determined amount of torque applied

each leg controller network consisted of 40 parameters:

25 recurrent weights
5 angle-sensor weights
5 biases
5 time constants (determined slope of sigmoid)

complete architecture consisted of six leg-controller networks with identical weights
ipsilateral (side) connection weights were identical on both sides
contralateral connection weights (connecting both sides) were identical for all leg pairs
total of 50 independent parameters to be optimized

Genetic Algorithm

each network parameter encoded as four bits
fitness-proportionate selection (roulette wheel)
crossover rate 0.6
mutation rate 0.0001
population size 500
fitness function: distance insect moved forward in a fixed amount of time
evolved for 100 generations

Problem Requirements

network has to generate motor signals that allow insect to walk
insect has to maintain center of gravity

Results

early networks stood on all six feet and pushed until they fell
other networks could move forward but lost their balance
later networks could move forward while maintaining stability
some evolved a tripod gait (front and back legs on one side synchronized with opposite middle leg)
tripod gait is common among fast-walking insects in nature
no learning algorithm involved -- GA determined neural network weights
very computationally intensive
much knowledge is designed in from the start

body shape
sensors
neural network architecture

training a fully-recurrent neural network is a hard task, but the GA performed well

Evolutionary Reinforcement Learning

Work by David Ackley and Michael Littman

Reference: Ackley, D. and Littman, M., "Interactions between learning and evolution", in Artificial Life II, SFI Studies in the Sciences of Complexity, vol. X, edited by C.G. Langton, C. Taylor, J.D. Farmer, & S. Rasmussen, Addison-Wesley, 1991.

Studied the combined effects of evolution and learning within a simulated world

Used a 2-dimensional grid world containing agents, carnivores, food sources, and obstacles

Each agent controlled by a pair of neural networks specified by its genome: the Action network and the Evaluation network

Action network

1 layer of weights
input layer represents agent's current sensory state
output layer represents agent's motor response to current sensory state
initial weights specified by genome
weights trained by a version of backpropagation (CRBP) over the agent's lifetime

Evaluation network

1 layer of weights
input layer represents agent's current sensory state
single output unit represents agent's own subjective evaluation of current sensory state
output of evaluation network used as reinforcement signal for training Action network
current evaluation better than previous evaluation => positive reinforcement
current evaluation worse than previous evaluation => negative reinforcement
weights specified by genome
weights are fixed throughout agent's lifetime

Complementary Reinforcement Backpropagation (CRBP)

output activation values interpreted as probabilities
used to stochastically generate a binary output vector
reinforcement signal determines training target to use for backpropagation
positive reinforcement => binary output vector used as target
negative reinforcement => complement of binary output vector used as target

Summary of Evolutionary Reinforcement Learning
- To produce a new individual (Birth):
  1. Pick an agent A from the population.
  2. If some agent B is physically close enough to A, then A and B mate to produce offspring C via standard 2-point crossover and mutation. If no other agent is sufficiently close to A, then A is simply cloned and mutated to produce offspring C.
  3. Translate C's genome into a pair of neural networks: an "evaluation network" with fixed weights and an "action network" with learnable weights (with initial weight values specified by the genome).
- To update an individual's action network weights (Day-to-day learning):
  1. Let input(t) be a vector of real numbers encoding an agent's current situation at time t, and let output(t) be a binary vector encoding some action for the agent to take in response to input(t). output(t) is determined by the agent's action network, and is a stochastic function of input(t).
  2. The agent evaluates its current situation by running input(t) through its evaluation network to produce a value E(t).
  3. If there is no previous situation (i.e., if the agent has just been born), go to Step 5, otherwise calculate the reinforcement value R(t) = E(t) - E(t-1). A positive reinforcement value means that the agent thinks its situation has improved since the previous time step; a negative value means that it thinks things are getting worse.
  4. If R(t) is positive, then whatever the agent did on the previous time step t-1 was a good thing for it to do (in its opinion), so strengthen its action network weights a little so that the agent will be more likely to generate the action output(t-1) given input(t-1). On the other hand, if R(t) is negative, then strengthen the action network weights a little so that the agent will be more likely to generate an action opposite to output(t-1) given input(t-1).
  5. Try out the updated weights by generating a new "hypothetical" output vector based on input(t-1). If R(t) is postiive but the new output differs from output(t-1), then the weights need to be strengthened a little more, so go back to Step 3. Similarly, if R(t) is negative but the new output is the same as output(t-1), then the weights need to be strengthened a little more in the opposite direction, so go back to Step 3. Otherwise, go on to the next step.
  6. The agent generates a response to the current situation by running input(t) through its action network to produce an action output(t), which it then performs.
  7. Increment t and go to Step 1.

Chromosome	Fitness
B': 01101110	5
C: 00100000	1
E': 10110000	3
F: 01101110	5