Supervised Learning
Pattern association
Pattern classification
Unsupervised Learning
Clustering and categorization
Data compression
Topographical feature-mapping
Supervised Learning
Perceptrons
Pattern associators
Backpropagation networks
Recurrent networks
Unsupervised Learning
Competitive learning networks
Self-organizing feature maps
Binary output value (typically + 1 or 0/1).
Output of perceptron categorizes input pattern x.
w0 weight acts as an adjustable threshold or bias.
Output is a linear combination of the input values xi.
Each set of weights corresponds to a particular decision surface in the n-dimensional input space.
In order for a perceptron to be able to categorize a set of examples correctly, the examples must be linearly separable: a decision surface must exist that completely separates the positive examples from the negative examples.
Perceptron Learning Rule
Initialize perceptron weights to small random values.
Choose a pattern from the training set.
Apply the pattern to the perceptron inputs and compute its classification.
sum = ∑i wi × xi
out = Θ(sum)
If pattern classification is incorrect, update perceptron's weights according to
Δwi = η × (target − out) × xi
winew = wiold + Δwi
where η is a small constant (~ 0.1) called the learning rate.
Go to step 2 and repeat for the next pattern until all patterns are classified correctly.
Perceptron Convergence Theorem
If training examples are linearly separable and η is small enough, PLR will converge in a finite number of steps to a set of weights that correctly classifies all examples.
Learns to associate input patterns with output patterns.
+1 -1 -1 +1 ("image of steak") ----> -1 -1 +1 +1 ("smell of steak") -1 +1 -1 +1 ("image of rose") ----> -1 +1 +1 -1 ("smell of rose")
Gradient-Descent Learning Algorithm
Initialize network weights to small random values.
Choose a pattern association A -> B from the training set.
+1 -1 -1 +1 ("image of steak") ----> -1 -1 +1 +1 ("smell of steak")
Apply pattern A to the input layer and propagate activation to the output layer.
sumi = ∑j aj × wj,i
outi = f (sumi)
aj are the activations of each input unit j
wj,i are the weights from each input unit j to output unit i
f (x) is a differentiable activation function such as
f (x) = x or f (x) = 1 / ( 1 + e-x )
Compute the error (δ) values for each output unit by comparing their activations to the target pattern B.
δi = ( targeti − outi ) × f '(sumi)
targeti is the ith component of target pattern B
outi is the activation of output
unit i
f ' (sumi) is the derivative of the activation
function (equals 1 if f(x) = x)
Update all connection strengths.
Δwj,i = η × δi × aj
wj,inew = wj,iold + Δwj,i
Go to step 2 and repeat for the next pattern association until overall error E is low enough, where
E = ½ × ∑patterns ∑i ( targeti − outi )2
Learning algorithm performs a gradient-descent search in weight space.
Ability to generalize behavior to novel inputs, beyond the original training patterns.
Resistance to noise.
Graceful degradation.
Can learn to behave as if following a rule.
Single-layer networks suffer from limitations (example: XOR problem).
Multi-layer networks can overcome these limitations using backpropagation learning algorithm.
Continuous, differentiable, non-linear activation function.
Nice property: σ '(x) = (1 − σ(x)) × σ(x)
Multi-layer networks can represent highly nonlinear decision surfaces
![]() |
![]() |
Backpropagation Learning Algorithm
Initialize network weights to small random values.
Choose a pattern association A -> B from the training set.
Apply pattern A to the input layer and propagate activation through the network to the output layer.
ai = σ( ∑j aj × wj,i )
aj are the activations of each unit j in the
previous layer
wj,i are the incoming weights to unit i from each
unit j in the previous layer
σ(x) = 1 / ( 1 + e-x )
Compute the error (δ) values for each output unit by comparing their activations to the target pattern B.
δi = ( targeti − outi ) × ( 1 − outi ) × outi
targeti is the ith component of target pattern B
outi is the activation ai of output unit i
Propagate errors backwards through the network.
δj = ( ∑i δi × wj,i ) × (1 − aj ) × aj
Update all connection strengths.
Δwj,i = η × δi × aj
wj,inew = wj,iold + Δwj,i
Go to step 2 and repeat for the next pattern association until overall error E is low enough, where
E = ½ × ∑patterns ∑i ( targeti − outi )2
Feedback connections in addition to feed-forward connections.
Equivalent to a dynamical system.
Feedback connections maintain state (short-term memory).
Networks can learn to recognize or generate temporal sequences of patterns.
Difficult to train in general.
Hopfield networks model associative memory.
Fully-recurrent architecture with symmetric weights.
Weights are determined beforehand by the set of patterns to be memorized.
Network starts with a corrupted or partially-complete pattern.
Network dynamics cause complete pattern to be recalled.
Each stored pattern acts as an attractor in an n-dimensional space, where n is the number of units.
Elman networks (also called Simple Recurrent Networks or SRNs) combine the advantages of recurrent connections with backpropagation training.
Single set of feedback connections with fixed unary weights.
Output of network at time t depends on state of hidden layer at time t − 1 (in addition to input pattern).
Network can learn to predict sequences that depend on more than just the immediately previous input. Example: A B C B A B C B A ...