Networks, and perhaps, infants can learn algebraic rules by detecting coincidences

Marcus et al. (1) habituated 7-month infants to sequences (sentences) of nonsense syllables (words) conforming either to the pattern ABA, or to the pattern ABB (e.g. ``ga ti ga'' or ``ga ti ti''). Subsequently, infants were presented with additional sentences consisting entirely of new words, with half the sentences conforming to the ABA pattern, and half to the ABB pattern. They found that infants habituated to the ABA pattern showed a marked preference for the ABB pattern, and vice versa.

The experimental results lead Marcus et al. to conclude that infants are capable of extracting and using abstract algebraic rules such as ``the first item X is the same as the third item Y''. Such an algebraic rule represents a relationship between placeholders or variables for which one can substitute arbitrary values. The experimental results also suggested that infants are able to extract algebraic rules rapidly from small amounts of data.

Marcus et al. pointed out that while most popular neural network models excel at capturing statistical patterns and regularities in data, they are incapable of extracting algebraic rules that generalize to new items (2). They noted, however, that certain types of connectionist architectures (3,4) that encode relationships between variables could extract such rules.

We have constructed a connectionist network architecture (5) derived from (3,6) which can readily acquire algebraic rules. The extracted rules are not tied to features of words used during habituation, and generalize to new words. Furthermore, the network acquires rules from a small number of examples, without using negative evidence, and without any pretraining.

But, perhaps, what is most significant about the proposed model is that it identifies a sufficient set of architectural and representational conditions that transform the problem of learning algebraic rules to the much simpler problem of learning to detect relevant coincidences within a spatiotemporal pattern. Our work suggests that even abstract algebraic rules can be grounded in concrete and basic notions such as spatial and temporal location, and coincidence.

The representational and architectural conditions identified by the model are as follows:

  • 1. There exist nodes that encode serial position within a sequence. Recent findings suggest that such nodes are biologically plausible (7).
  • 2. The network can express bindings between a positional node and the item that occupies this position in a given sequence.
  • 3. The bindings are expressed via temporal synchrony, that is, the occurrence of an item A in a particular position P in a sequence is coded by the synchronous activity of the cells encoding A and cells encoding P. There is considerable evidence that synchronization of neural activity might underlie the encoding of bindings (3,8).
  • 4. Nodes representing positional roles and items are interconnected via recurrent connections mediated by intermediate (or hidden) layers of nodes.
    Lokendra Shastri International Computer Science Institute 1947 Center Street, #600 Berkeley, CA 94704, USA. (510) 642-4274 ext 310. E-mail: shastri@icsi.berkeley.edu

    References and Notes

    1. G.F. Marcus et al., Science 283, 77 (1999).

    2. Comments on (1) by M.S. Seidenberg and J.L. Elman; M. Negishi; and P.D. Eimas, and response by G.F. Marcus, Science 284, 434 (1999).

    3. L. Shastri and V. Ajjanagadde, Behav. Brain Sci. 16, 417 (1993).

    4. J.E. Hummel and K.J. Holyoak, Psychol. Rev. 104. 427 (1997).

    5. Details of the network architecture and simulation may be found in a report by L. Shastri and S. Chang, available at http://www.icsi.berkeley.edu/~shastri/babytalk.

    6. L. Shastri, Technical Report TR-97-003, International. Comp. Sci. Inst., Berkeley, June 1997.

    7. A.F. Carpenter et al., Science 283, 1752 (1999).

    8. W. Singer and C.M. Gray, Annu. Rev. Neurosci. 18, 556 (1995); M. Usher and N. Donnelly, Nature 394, 179 (1998); von der Malsburg, Technical Report 81-2, Max-Plank Inst. for Biophys. Chem., Gottingen, Germany, 1981.