Marcus et al. (1) habituated 7-month infants to sequences (sentences) of nonsense syllables (words) conforming either to the pattern ABA, or to the pattern ABB (e.g. ``ga ti ga'' or ``ga ti ti''). Subsequently, infants were presented with additional sentences consisting entirely of new words, with half the sentences conforming to the ABA pattern, and half to the ABB pattern. They found that infants habituated to the ABA pattern showed a marked preference for the ABB pattern, and vice versa.
The experimental results lead Marcus et al. to conclude that infants are capable of extracting and using abstract algebraic rules such as ``the first item X is the same as the third item Y''. Such an algebraic rule represents a relationship between placeholders or variables for which one can substitute arbitrary values. The experimental results also suggested that infants are able to extract algebraic rules rapidly from small amounts of data.
Marcus et al. pointed out that while most popular neural network models excel at capturing statistical patterns and regularities in data, they are incapable of extracting algebraic rules that generalize to new items (2). They noted, however, that certain types of connectionist architectures (3,4) that encode relationships between variables could extract such rules.
We have constructed a connectionist network architecture (5) derived from (3,6) which can readily acquire algebraic rules. The extracted rules are not tied to features of words used during habituation, and generalize to new words. Furthermore, the network acquires rules from a small number of examples, without using negative evidence, and without any pretraining.
But, perhaps, what is most significant about the proposed model is that it identifies a sufficient set of architectural and representational conditions that transform the problem of learning algebraic rules to the much simpler problem of learning to detect relevant coincidences within a spatiotemporal pattern. Our work suggests that even abstract algebraic rules can be grounded in concrete and basic notions such as spatial and temporal location, and coincidence.
The representational and architectural conditions identified by the model are as follows:
Lokendra Shastri International Computer Science Institute 1947 Center Street, #600 Berkeley, CA 94704, USA. (510) 642-4274 ext 310. E-mail: shastri@icsi.berkeley.edu
1. G.F. Marcus et al., Science 283, 77 (1999).
2. Comments on (1) by M.S. Seidenberg and J.L. Elman; M. Negishi; and P.D. Eimas, and response by G.F. Marcus, Science 284, 434 (1999).
3. L. Shastri and V. Ajjanagadde, Behav. Brain Sci. 16, 417 (1993).
4. J.E. Hummel and K.J. Holyoak, Psychol. Rev. 104. 427 (1997).
5. Details of the network architecture and simulation may be found in a report by L. Shastri and S. Chang, available at http://www.icsi.berkeley.edu/~shastri/babytalk.
6. L. Shastri, Technical Report TR-97-003, International. Comp. Sci. Inst., Berkeley, June 1997.
7. A.F. Carpenter et al., Science 283, 1752 (1999).
8. W. Singer and C.M. Gray, Annu. Rev. Neurosci. 18, 556 (1995); M. Usher and N. Donnelly, Nature 394, 179 (1998); von der Malsburg, Technical Report 81-2, Max-Plank Inst. for Biophys. Chem., Gottingen, Germany, 1981.