Recursive Auto-Associative Memory (RAAM)

 



Training a RAAM Network


Sequential RAAM


Example: Sequence ABB




A Case Study of RAAM



Sentence Generation


Simple grammar generated 2- and 3-word sentences using 26 words (15 nouns, 11 verbs) + end-marker (similar to Elman, 1990):

Templates used by the sentence generator:

Example sentences:



Network Architecture


27-bit localist representation of words

Sequential RAAM with 30 hidden units

We don't want to commit to a particular structuring of the input (e.g., a parse tree).  Want the network to discover good representations of sentences on its own.

TRAINING set:  100 random sentences
TESTING set:  100 different sentences


Network Performance


100 TRAINING sentences presented in random order for ~ 21,000 epochs

Test 1:  Encoding and decoding

Test 2:  Ungrammatical sentences


Analysis of Encoded Sentences


Performed cluster analysis of 100 encoded TRAINING sentences

30-dimensional space --> tree structure


Observations:


Composite Word Representations


An encoded word representation was created for each of the 286 words appearing in the original 100 TRAINING sentences.

Example:

tarzan eat banana
    [] tarzan                            H1
    [tarzan] eat                       H2
    [[tarzan] eat] banana       H3

jane see tarzan
    [] jane                               H4
    [jane] see                          H5
    [[jane] see] tarzan            H6

tarzan = average( H1, H6, . . . )


Observations:


Cluster analysis of composite word representations:

Observations:


Holistic Operations on Distributed Representations


Can useful information be extracted directly from distributed representations, without first decoding them into their constituent parts?

Experiments involving three types of operations were performed

Feature Detection

Can particular features of distributed representations be recognized?


1. Aggressive-animal detector

2. Aggressive-animal and human detector 3. Reflexive sentence detector Parallel Decoding

Is the sequential peeling-off of one symbol at a time the only method for retrieving information from a distributed representation?


Trained parallel decoding network with 50 encoded TRAINING sentences for ~ 7200 trials

Tested using other 50 encoded TRAINING sentences

Task:  Given a distributed representation as input, produce all words simultaneously on output units (for 2-word sentences, third-word units have no activation)

Performance: 81% correct (i.e., no word errors)

Errors usually involved incorrect words of the same grammatical type (noun/noun or verb/verb)

Components of distributed representations can be accessed directly

Syntactic Transformations

Can operations be performed "holistically" on distributed representations?

Task:  NOUN1 chase NOUN2  ==>  NOUN2 flee NOUN1

New RAAM training corpus:  20 chase/flee sentence pairs, 110 other sentences

Trained RAAM as before for ~ 3700 trials

4 novel chase/flee sentence pairs encoded using the trained RAAM

24 chase/flee pairs total (20 used for training RAAM, 4 novel)

Transformation network trained using 16 chase/flee pairs (~ 75 trials)

Transformation network tested on 8 remaining chase/flee pairs (4 trained, 4 novel)


Performance: