April 7, 1998
If you have a sentence such as 'Harry bought a radio' or 'The radio cost $50,' the verbs of these sentences are defined relative to the same frame. 'Bought' picks out an action in this frame and has an agent, which is its subject. 'Cost' picks out a relation in the frame and doesn't have an agent, but it has a subject. For 'cost' the subject is the good of the commercial event frame. In general, when a verb describes an action, its subject is an agent. This is a soft rule. The notion of agent is not in the commercial event frame because it is more general than that frame. It is a higher order notion that has to do with generalized actions, but in the commercial event frame, you have specific actions. The general agent requires its own frame and you have to link that frame with the notion of agent to the more specific commercial event frame.
The choice of a verb doesn't just pick out a frame, it picks out what you are focusing on in the frame. 'Buy' and 'sell' both have agents as subjects, but they pick out different agents with respect to the commercial events frame. 'Buy' focuses on the buyer and 'sell' focuses on the seller.
In two systems you have seen before, there are different perspectives. Regier's system was done from a third person perspective. VerbLearn was done from a first person perspective. There is a part of the brain which will bring together actions which you do and your recognition of those actions in other people. There is a neural mechanism which allows us to use the neural structures of first person actions to recognize third person actions. Remember the experiment with the monkey in which the same neurons fired when the monkey pressed a button and when the monkey saw the experimenter press a button. The cognitive science talk on Friday, April 17 will be Brian MacWhinney who does research on this issue.
The system of aspect is also a higher order structure for events and actions. For sentence structures and frame, there is also a higher order structure with notions such as agent, patient, goal, source, etc. For grammar, notions such as agent, patient, goal, source are a big deal. For instance, indirect objects tend to be goals, ablatives (in languages with cases) tend to be sources. There's a higher level structure that function relative to grammar but it doesn't map one to one onto grammar. Agents are not always subjects. So what are the principles linking notions like subjects and notions like agents. There are many different theories about what these principles are, but they must exist.
In case languages, you get agents, patients, goal, source, beneficiary, location, time, manner, etc. One question is what is the higher order frame structure used semantically to map these notions onto grammar. We've seen how a higher order aspect frame is mapped onto verbs. We've also seen a higher order tense frame. And we know there's a higher order frame for thematic roles (agent, patient, etc.). We need a way to connect these frames to grammar and also to more specific frames such as the commercial event frame. The neural structure which accomplishes this is not completely know, but the systems examined in class, such as VerbLearn and Narayan's system give an account of some of these things. VerbLearn showed how the properties of verbs are mapped onto systems for doing actions.
So we need to be able to link sentences to actions to frame structures and we need to be able to link lexical items to all of these.
Verbs with respect to features, had to do with the object properties and action properties. Some of these properties include force, fragility of the object, mass of the object, hand posture, aspect, manner. These are features of the verb and the values of the features are the parameters. The verb 'cost' has a direct object and a subject. It picks out the commercial event frame. Its subject is the good; its object is the price. It can take an optional indirect object, which is the buyer. 'It cost me $50.' The verb picks out parts of frames and marks certain parts of subjects, objects, etc. All verbs do this universally, so there must be some kind of link between verbs and the correct parts of semantic frames and between verbs and grammar. There has to be some neural binding mechanism to do this job. All of this information, though, doesn't have to be specified for each particular verb. There are generalizations within and across languages.
For a given language, generalizations can be constructions, such as the active construction or the passive construction. These specify the thematic role of the subjects, among other things. These general, higher-level constructions and principles of grammar are represented neurally.
Verbs tend to pick out frames (in English), but there are 'light verbs' which seem to work with many frames. For instance, in 'do a repair job' the verb 'do' will not pick up the frame, but 'a repair job' will. In 'make a decision,' 'make' will not pick out the frame but 'decision' will. These light verbs have their own systems. Other languages have only a few general verbs, so the nouns in general pick out the frames. Some languages have few nouns, so the verbs carry most of the information. Cross-linguistically, a verb tends to be the locus of aspect.
A neural account of these things have to take into account our ability to understand phrases which don't adhere strictly to grammatical rules. These things need to be learned and represented neurally. In addition, sometimes we can make conscious decisions about processes which are usually unconscious. The issue of consciousness is not being addressed in this course. There currently isn't any way to explain the phenomenon of consciousness neurally.
In VerbLearn, you have feature structures with features such as which X-schema is used. Picking out an X-schema is like picking out a frame, but its one that lets you do something. The other features, such as force, etc., have to do with parameters of the action in the X-schema frame or object properties, such as size. Neurally, a feature structure is a set of triangle nodes with connections. The structure of the neural circuit tells how parameters are used in the system and how lexical items link to those parameters. The representations of parameters are distributed throughout the brain. The feature structures are notations for triangle nodes with connections to word phonology and to fine-grained motor programs. How are frames represented neurally? Can X-schemas be used to represent a portion of a frame, for instance, the scenario portion?
For every lexical item, you have to know what frame is being called up, what parts of the frame are picked out by the item's arguments, what the inherent aspect of the item is, what features are the properties of the item (for instance, is it animate). Some features are frame internal requirements; for instance, "buyer" is a role with the property animate. In this case, lexical items can be chosen to match the properties of features of the frame. Some lexical items have constraints which don't come from the frame. For example, the verb 'cost' takes an indirect object as in 'that cost me $50'. It can't take a 'to' prepositional phrase as an alternative to the indirect object, as in *'that cost $50 to me'. This is a constraint on the lexical item, 'cost', not a constraint in the commercial event frame. You need to sort out the frame vs. lexical phenomenon. If something is a constraint across the entire frame, then it's in the frame. At Berkeley, the FrameNet project run by Charles Fillmore at ICSI. The point of the project is to set up a computerized dictionary which investigates how frames, grammar and lexical items would be represented and how they would be linked up with each other.
There are also constructional constraints on lexical items. Some verbs, such as 'rumor' only occur in the passive, 'he was rumored to have left'. Some verbs only occur in the passive when they have infinitives, for instance, 'he was said to have left'. So the notion of construction will have to be represented neurally as well.
Discussion of some issues related to the VerbLearn model.
Lexical items such as 'slide' have polysemous senses, the most central of which is object centered (intransitive sense which describes the movement of the object) rather than agent centered. The VerbLearn model is only agent centered, so there is no way to represent the object-centered sense of 'slide'.
In any computational model, some aspects of the situation are factored out. For instance, different perspectives are not handled by VerbLearn. The action of pushing is very different depending on whether you are pushing, being pushed or observing a push. These are experienced very differently by the body, but they must be combined somehow.
Verbs, such as 'slide' will pick out different pieces of the situation. For instance, they may describe the physical action, but they may describe the kind of object generally acted on. The choice of verbs can depend on both. Another example of this in English is the large class of destruction verbs which depend on the way the object looks after the destruction (mince, chop, mangle, etc.) When children learn words, they may learn only the associated action and not attend to other parts of meaning. This is called manner dominance. For instance, they may learn that 'stir' involves a particular kind of action without also learning that the stuff being stirred should have some state as a result of being stirred. Such information would be learned later. The VerbLearn model also seems to show this kind of development in learning labels for actions.
In the VerbLearn model, 'slide' is a name for the underlying X-schema. It isn't meant to suggest that the 'slide' X-schema contains all the information which the lexical item 'slide' would contain.
X-schemas represent actions but they do not always contain enough information to capture the meaning of a lexical item. Some words, for instance, crucially refer to particular results or particular initial states (such as 'thaw'). So you would need a beginning state, action and final state which can all be used in combination for the labeling process. For instance, the verb 'peel' refers to a particular kind of action involving a particular object with a particular feature (it has a peelable skin). Verbs code this kind of information. This is an illustration of the interaction between the world state and bodily actions, which must be included in a model of lexical items.
Some things which are not action, such as goals and intentions are not part of the model. There has to be some neural place in which we simulate what's happening in the world. An intention is often a simulation which you are trying to achieve. The latest versions of VerbLearn can model the physical world and other people.
Also, VerbLearn doesn't provide an explicit link between similar verbs
which seem to be variations of the same action. There is an implicit
connection in that they use the same schema with different parameters, but
there is no indication that two verbs are in the same semantic fields.
Push, shove, nudge are in the same semantic field, as are pull, push, etc.
In the AI field this would be called metaknowledge, a knowledge of these
relationships between these verbs. Semantic field researchers claim that
this kind of metaknowledge is part of the lexical information associated
with the item. The metaknowledge of semantic fields helps a speaker decide
which verb to choose for particular situations. It would be easy to add
this kind of information to the VerbLearn system.