Lecture 5. Connectionist Models 
 

February 3, 1999

At this point in the course, the introductory material is finished, and we will concentrate on computing. We will go back later and see how these things relate to the problems discussed in the first part of the course. Currently, we've had three lectures on the brain mechanisms, development and methods of experimentation. We have also had three lectures about high level linguistics and cognitive issues. The rest of the course will be focused on how we can bridge the gap between the brain and language and thought. This will be done with models.

Models are necessary for abstracting away from the extreme concrete level, which is too detailed to be used for modeling system behavior. Even the biologists who aren't interested in language and thought make models. Models abstract some of the detail, so that you can concentrate on a certain part of the system without getting swamped in the details. Models should be predictive of behavior and tractable; you should be able to tell from the model what the behavior of the model will be. We're interested in computational models because we're interested in understanding the computations of the brain. In particular this class is concerned with connectionist models.

A connectionist model is a particular kind of model with computational units, which are abstractions from neurons, in networks , which are abstractions from the dendritic and axonal connections in the brain. A sample of a biological model was shown which simulates actual neural spikes. The form of the model depends on which things you choose to abstract and which you choose to study. With the big gap between language/thought and the brain, there are many kinds of models which you could choose to make.

There are two distinct approaches to this. You could worry about only a behavioral model and not worry about the structure at all. But if you want a structured model, you need to make a structured connectionist model, which is what we'll do in this class. But you could also build any computational model of language or thought which doesn't pay attention to what's going on in the brain. This is conventional AI, which tries to build computational models for a practical purpose or to understand language and thought better but doesn't worry about how that model might map onto the brain. Biologists make models which map onto the brain, but are not particularly interested in understanding language and thought. We're interested in models which try to explain language and thought processes and actually map onto the brain. A slide of a biological model of higher-level control of reflexes is shown. It shows low level excitatory and inhibitory connections between neurons. The big move is to make models of language and thought with brain-like computational primitives, carried out in massively parallel connectionist computational style.

Sigmund Freud developed the first connectionist model. He was trained as a biologist and he thought that to understand the psyche, you need a scientific theory. His theory was basically connectionist, had to do with spreading activation. He abandoned the project in order to keep the field of psychology from begin taken over by the biologists, medical doctors. This theory makes sense of some of his later notions of repression, flow, etc. Hebb was a later connectionist. So here's the idea of making connectionist models of higher thought.

Previous connectionist models of language and thought which we have seen are the necker cube model, the word superiority model, etc. There was a slide which showed a connectionist model of how you might speak a phrase. The slide shows two versions of the model, so there are lots of ways to do these kinds of models. The MacKay model has nodes which self-inhibit, similar to the latency period of neurons. Eikmeyer and Schade didn't use self-inhibition, but they had mutual inhibition and winner-take-all networks. So there are many choices for what exactly goes into your model. The details don't matter that much from model to model; it's like a Turing equivalent. Different models will exhibit the same behavior. However, if you make a biological model, the details matter a lot because you're making predictions about what the biology does. Psychologists don't normally make those predictions. They say that the kind of computations in their models are the best ways to explain the psychological problem at hand.

***NTL slide****

The slide is from a series of papers that the NTL group gave at the last Cognitive Science conference. At the top is the level of language and cognition; at the bottom is the level of the brain and structured connectionism is in the middle. The biologists use the same techniques to model biology. We are interested in computational level models which can be reduced to connectionist models and finally brain models. The course will teach you to understand this slide.

There was a slide of a silicon retina, a detailed analog simulation of a brain system. People have tried to use this as a fancy t.v. camera which is self-adaptive, but there hasn't been much success. If anyone is interested in pursuing this, see Jerry for more information.

There is another way to look at this modeling problem as a computer scientist. A slide is shown of a standard computational model with rules which doctors use for differential diagnosis of flu vs. cold vs. sinusitis. The rules are for example: "If high fever and headache, then add head problem. If head problem and runny nose, then add flu." This is standard computer science. If you wanted to do the same kind of thing as a connectionist network, it might look like the next slide., which has a picture of a network with symptom units across the bottom as input, the diseases across the top as output and hidden units in between. The connections are weighted. The weights can be designed by hand or the networks can be made to learn weights with the back-propagation technique. We will practice both building hand- designed and learning networks with T-learn. This shows the contrast between how you might solve a problem with standard computational models and connectionist models which are reducible to the brain level.

Historically, around 1945, the end of W.W.II, some scientists began working on making intelligent machines. Some of them chose to use standard computer science methods with logic as a basis. They built rules for characterizing intelligence. The others tried to make their machines as much like the brain as possible, assuming that the machines will get intelligent that way. There was a slide which listed from a computational point of view the advantages of the standard computer science method: it is clear how to do representation; inference is clear from logical rules; the algorithms are universal. The advantages of neural networks are that they can adapt or learn, are massively parallel and robust; because they have redundant connections, they are much more error tolerant than standard programs. Structured connectionist models try to take advantage of the best properties of both approaches. By structuring the models, you can get the representation and inference of conventional computer science and the connectionist element offers the advantages of the neural networks.

Neural networks were originally very exciting because the first people to study them said that they would be able to learn what was necessary without having to be programmed, but Minsky and Paper proved that that wouldn't work. Neural networks fell out of favor, and conventional AI was pursued. A renaissance of neural networks occurred in the late seventies, partly triggered by some of the psychological models as well as physics and computer science models. There has been influence between neural network people and conventional AI people. For instance , today, many AI models are quantitative, using probabilistic methods such as Bayes nets, rather than logic.

A slide was put up which shows a unit of a neural network. The input of each unit is calculated as the weighted sum of the total inputs. The output is yj. w1j is the weight from unit one to unit j. The bias, a unit constantly providing input to j, is the equivalent of a threshold in the unit that determines whether it fires. The weight from the bias to the unit is the negative of what the threshold would be. The T-learn program uses bias units.

***step function*** (see reader)

There are different kinds of output functions which can be used. The threshold unit uses the equation yi = 1, if xj > 0 0, otherwise This is graphed as a step function. It models pretty well the actual firing behavior of a neuron. It might be used to model the spiking behavior of neurons, as in the model shown earlier. The perceptron is based on this output function, although the term perceptron is today applied to units which don't have this output function. The problem with this function is that the strength of neural activation, which is important for models such as the Necker cube would have to be modeled as the number of spikes. Then the model has to account for timing and other messy problems. Other output functions work better for cognitive models.

**** linear function*** (see reader)

The linear output function is yj = axi. The output is proportional to the weighted sum of the input. This output function is used in pronunciation models. The advantages of this are that you can do the same thing as with the threshold function, and it is used in control theory , so a lot is known about it. Also, instead of having the output be a spike, it is closer to spike frequency. Such models don't have to account for timing, etc. There are some disadvantages: linear systems grow without bounds. If a model has a positive feed-back loop with these units, then the output will grow infinitely. No physical systems do this. Also, a strictly linear system doesn't get additional computational power from additional layers. This is a big restriction of computational power. Also, we will want to have functions which are differentiable. This will become important for the network learning rule, to be discussed later.

***sigmoid function*** (see reader)

The sigmoid output function is given above. Its advantages are that it's bounded. It will stay between 1 and 0. It is also differentiable. It is mathematically much simpler than the step function. The learning technique, back propagation, requires a smooth exponential function, like the sigmoid function.

There was a demonstration of T-learn, a program which is used for the current and next homeworks. T-learn is designed for building neural networks which can learn their weights. In class, a "logical OR" network was built. The network had two inputs of 0 or 1, a bias unit and one output. If there is at least one 1 in the input, the output should be 1. The weights were first fixed by hand, then they were learned. Features of the program were demonstrated, including training and testing options and examples of network structure files. There is an introduction to T-learn in the reader.

The OR function and the AND function for the homework are possible with the simple networks demonstrated in class, which have no hidden units. However, networks like this cannot learn the EXCLUSIVE OR function. This is because, if the output function is linear, the network represents a line in weight space. If the four possible input pairs are graphed, they can be separated by a single line, above which the output is 1, below which the output is 0. (These graphs can be seen on p. 124 in reading 10 from the reader, McCleland, and Rumelhart, 1989.) This is not true for a graph of inputs for EXCLUSIVE OR. The middle set of inputs has to be separated out from the outer inputs (on the graph), so a linear model can't get the right results. However, if an extra layer of units between the input and output, called hidden units, is added, the network can do exclusive or. These multi-layer networks can learn any function on a plane, but lots of functions are not on a plane, for instance the function for walking has loops in it. These networks we have seen are feed- forward with no loops, so they are still limited, but they can do much more than their predecessors. This increase in computational power has led to many practical applications of neural networks and back-propagation is also used widely in cognitive modeling. But back-propagation is not biologically plausible and the kinds of nets it works on fail to capture much of what we need to describe the relation between language and the brain. The next two lectures will describe various kinds of structure and learning in connectionist networks and some of their applications.