This lecture covered some simple extensions of back-propagation learning to stylized recurrent networks. We previewed the speech recognition task, which will be discussed in detail later. We then started with some limitations of feed forward nets including their inability to learn grammar and the following general proof.
Shift invariance is the ability of a neural system to recognize a pattern independent of where appears on the retina. It is generally understood that this property can not be learned by neural network methods, but I have not seen a published proof. A "local" learning rule is one that updates the input weights of a unit as a function of the unit's own activity and some performance measure for the network on the training example. All biologically plausible learning rules, as well as all backprop variants, are local in this sense.
It is easy to show that no local rule can learn shift invariance. Consider learning binary strings with one occurrence of the sequence 101 and otherwise all zeros. First consider length 5; there are only 3 positive examples:
The one dimensional case of shift invariance can be handled by treating each string as a sequence and learning a finite-state acceptor. But the methods that work for this are not local or biologically plausible and don't extend to two dimensions.
The unlearn-ability of shift invariance is not a problem in practice because people use preprocessing, weight sharing or other techniques to get shift invariance where it is known to be needed. However, it does pose a problem for the brain and for theories that are overly dependent on learning.
Chapter 7 of P&E describes a hack, weight sharing, for getting around shift invariance if you know in advance what is required. The techniques described are of general interest.
It is easy to see that the "101" strings can be recognized by a simple Finite State Automaton (FSA). P&E in Chapter 8 present a simple extension of feed forward nets, called Elman Nets, that do a pretty lame job of learning FSA, but again the techniques are widely used in PDP modeling. We discuss some other simple variations of feed forward nets and, time permitting, the general case of learning recurrent networks. The most widely used technique for learning recurrent networks, back propagation through time (BPTT) is supported by Tlearn but not discussed in P&E. A future assignment will ask you to try it.